Identification and characterisation of Short Interspersed Nuclear Elements in the olive tree (Olea europaea L.) genome

Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Giordani, Tommaso; Cavallini, Andrea

doi:10.1007/s00438-016-1255-3

Identification and characterisation of Short Interspersed Nuclear Elements in the olive tree (Olea europaea L.) genome

Original Article
Published: 06 October 2016

Volume 292, pages 53–61, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Molecular Genetics and Genomics Aims and scope Submit manuscript

Identification and characterisation of Short Interspersed Nuclear Elements in the olive tree (Olea europaea L.) genome

Download PDF

Elena Barghini¹,
Flavia Mascagni¹,
Lucia Natali¹,
Tommaso Giordani¹ &
…
Andrea Cavallini¹

623 Accesses
7 Citations
Explore all metrics

Abstract

Short Interspersed Nuclear Elements (SINEs) are nonautonomous retrotransposons in the genome of most eukaryotic species. While SINEs have been intensively investigated in humans and other animal systems, SINE identification has been carried out only in a limited number of plant species. This lack of information is apparent especially in non-model plants whose genome has not been sequenced yet. The aim of this work was to produce a specific bioinformatics pipeline for analysing second generation sequence reads of a non-model species and identifying SINEs. We have identified, for the first time, 227 putative SINEs of the olive tree (Olea europaea), that constitute one of the few sets of such sequences in dicotyledonous species. The identified SINEs ranged from 140 to 362 bp in length and were characterised with regard to the occurrence of the tRNA domain in their sequence. The majority of identified elements resulted in single copy or very lowly repeated, often in association with genic sequences. Analysis of sequence similarity allowed us to identify two major groups of SINEs showing different abundances in the olive tree genome, the former with sequence similarity to SINEs of Scrophulariaceae and Solanaceae and the latter to SINEs of Salicaceae. A comparison of sequence conservation between olive SINEs and LTR retrotransposon families suggested that SINE expansion in the genome occurred especially in very ancient times, before LTR retrotransposon expansion, and presumably before the separation of the rosids (to which Oleaceae belong) from the Asterids. Besides providing data on olive SINEs, our results demonstrate the suitability of the pipeline employed for SINE identification. Applying this pipeline will favour further structural and functional analyses on these relatively unknown elements to be performed also in other plant species, even in the absence of a reference genome, and will allow establishing general evolutionary patterns for this kind of repeats in plants.

Low coverage sequencing for repetitive DNA analysis in Passiflora edulis Sims: citogenomic characterization of transposable elements and satellite DNA

Article Open access 02 April 2019

Genome-wide analysis of LTR-retrotransposon diversity and its impact on the evolution of the genus Helianthus (L.)

Article Open access 18 August 2017

Comparative analysis of miniature inverted–repeat transposable elements (MITEs) and long terminal repeat (LTR) retrotransposons in six Citrus species

Article Open access 15 April 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Short Interspersed Nuclear Elements (SINEs) are transposable elements (TEs) of eukaryotic genomes, i.e. sequences of DNA capable of moving from place to place in the genome and of replicating independently of the systematic replication of genomic DNA (Wicker et al. 2007). In plants, SINE identification has been achieved only in a limited number of species, and especially in model plants, whose genome was completely sequenced. The aim of this work was to produce a specific bioinformatics pipeline for analysing second generation sequence reads and identifying SINEs in a non-model species, Olea europaea.

TEs represent a large proportion of eukaryotic genomes and are subdivided into two major classes: DNA transposons and retrotransposons. DNA transposons encode a transposase enzyme catalysing the excision of the element and its integration into a new genomic location. Retrotransposons transpose through an RNA intermediate that is transcribed from a genomic copy of the element, then reverse-transcribed into DNA by a TE-encoded reverse transcriptase and reintegrated elsewhere in the genome. Consequently, at each replication cycle, retrotransposons increase their copy number in the host genome, resulting in a host genome expansion (Wicker et al. 2007).

Retrotransposons can be distinguished into LTR retrotransposons, which are flanked by long terminal repeats (LTRs), and non-LTR elements. The first are represented by two main superfamilies, Copia and Gypsy (Wicker et al. 2007), and account for the majority of plant nuclear genomes, but are less abundant in animals. Non-LTR retrotransposons include long interspersed nuclear elements (LINEs) and SINEs. LINEs and SINEs vary in prevalence and diversity in eukaryotes and are generally poorly represented in plants, but predominate over the LTR retrotransposons in many animals. For instance, the LINE L1 superfamily constitutes about 20 % of the human genome; and the Alu element, a SINE, has at least 500,000 copies (Rowold and Herrara 2000).

Intact LTR retrotransposons and LINEs encode the enzymes necessary for the reverse transcription and the integration of the new copy into the genome (autonomous elements); in nonautonomous elements, reverse transcription and integration rely on the enzyme machinery produced by other elements.

SINEs are short nonautonomous retroelements ranging in size up to 500 nucleotides, and their retrotransposition depends on proteins encoded by a partner LINE (Singer 1982; Boeke 1997; Dewannieux et al. 2003). The so-called ‘stringent’ SINEs have a unique, obligatory partner (Dewannieux et al. 2003), whereas others have more partners (Kajikawa and Okada 2002).

A SINE consists of two or more modules. The 5′-terminal head originates from one tRNA, 7SL RNA, or 5S rRNA, synthesised by the RNA polymerase III, and carries an internal promoter (as the corresponding RNAs). The central domain, the body, has an unknown origin or descends from a partner LINE and can contain a core domain that is shared with distant SINE families. The 3′-terminal tail is a sequence of variable length that consists of a poly(A) stretch, a poly(T) stretch, or of simple sequence motifs. In addition, two SINEs can combine into a dimeric SINE (Vassetzky and Kramerov 2013).

SINEs are divided into three types according to the RNA of origin. The majority of SINEs are ancestrally derived from tRNAs (Galli et al. 1981; Okada 1991; Sun et al. 2007). By contrast, some mammalian SINEs, such as B1 and Alu, originated from 7SL RNAs, while the zebrafish SINE3 is derived from 5S rRNA genes (Ullu and Tschudi 1984; Kapitonov and Jurka 2003). SINEs originated from 5S rRNA have been found also in a few mammals such as fruit bats (Gogolevsky et al. 2009) and springhare (Gogolevsky et al. 2008).

SINEs are transcribed by the cellular RNA polymerase III from the internal promoter allowing them to be expressed (Galli et al. 1981), although active external RNA polymerase II-related promoters have been found in some cases (Ferrigno et al. 2001). The SINEs promoters originated from tRNA or from 7SL RNA consist of two boxes (A and B) of about 11 nt, spaced by 30–35 nt, while those originated from 5S rRNA genes have three boxes: A, IE and C (Schramm and Hernandez 2002). The presence of the promoter within the transcribed sequence is critical for SINE amplification, as the promoter is preserved in new SINE copies.

The notion of ‘SINE family’ is widely used, but not clearly defined. Vassetzky and Kramerov (2013) consider a SINE family as a set of SINEs of a common origin and consisting of the same modules (except the tail, which can vary even within the same family). Thus, similar SINEs with different LINE-derived regions (e.g. mammalian Ther-2 and Mar-1) belong to different families. Long insertions are considered as modules. At the same time, internal deletions or duplications within modules do not give birth to a new family; although a combination of complete or almost complete SINEs (complex SINEs) is considered as a new family. Finally, there are a few SINEs with similar structure but of independent origin (e.g. simple SINEs: ID in rodents, vic-1 in camels and DAS-I in armadillos) that are consequently considered as different families (Vassetzky and Kramerov 2013).

SINEs have frequently been used to explore phylogenetic relationships, based on insertional polymorphism, in humans (Perna et al. 1992), rodents (Churakov et al. 2010) and others animals. In plants, several SINEs have been identified, such as in Brassica napus (Deragon et al. 1994), Oryza sativa (Umeda et al. 1991), Nicotiana tabacum (Yoshioka et al. 1993), Myotis daubentoni (Borodulina and Kramerov 1999), Solanum tuberosum (Wenke et al. 2011), rice (Tsuchimoto et al. 2008), sugar beet (Schwichtenberg et al. 2015). However, plant SINEs remains largely unidentified due to their low level of retrotransposition. Although SINEs may have an impact on gene expression, their insertions in the genome seem to be more tolerated compared to other retroelements, even in gene-containing regions, probably because of SINE small size (Lenoir et al. 2001). Moreover, SINEs have no conserved coding sequence, making their research very difficult.

The first plant SINEs identified were p-SINE1 of rice and TS of tobacco (Umeda et al. 1991; Yoshioka et al. 1993). The current distribution of known SINE families suggests that SINEs are ubiquitous in plants and that the number of characterised plant SINE families will increase with the number of sequenced plant genomes. For example, SINEs are involved in the evolution of genes in Solanaceae species: they can produce start and stop codons, splice sites, enlarged introns and UTRs to genes (Seibt et al. 2016). Despite SINEs multiple presumed roles, their identification has been achieved only in a limited number of plant species and it does not allow to establish general evolutionary patterns for this kind of repeats. Such a lack of information is apparent especially in non-model plants whose genome has not been sequenced yet.

Olea europaea L. has a medium sized haploid genome of 1.4 Gbp (Loureiro et al. 2007) whose structure is still largely uncharacterised, despite the economic, cultural and ecological importance of olive trees in the Mediterranean area, now extending to other regions. Concerning the repetitive component of the olive genome, tandemly repeated sequences and putative retrotransposon fragments were isolated and characterised (Katsiotis et al. 1998; Bitonti et al. 1999; Minelli et al. 2000; Lorite et al. 2001; Contento et al. 2002; Stergiou et al. 2002; Natali et al. 2007). Recently, next generation sequencing technologies and different computational procedures have been used to gain a general insight into the composition of the olive genome and its repetitive fraction (Barghini et al. 2014, 2015a). Illumina and 454 reads from genomic DNA were assembled following different procedures, obtaining more than 200,000 differently abundant contigs, with mean lengths higher than 1000 nt. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (more than a quarter of the whole genome), consisting of six main families of different length. Such a large proportion of tandem repeats is quite rare in plant genomes. The other large abundant class in the olive genome is represented by transposable elements (especially LTR-REs). Such analyses showed that non-LTR retrotransposons are poorly represented, as frequently observed in plant genomes (Barghini et al. 2014).

Recently, a draft assembly of the genome of O. europaea has been released (Cruz et al. 2016), providing a valuable resource for the study of key phenotypic traits of this important tree. However, this genome draft is quite fragmented (11,038 scaffolds with N50 = 443 Kb) and the work was especially focused on assembly procedure and olive gene prediction, while analysis of repetitive DNA was largely incomplete.

With the aim of obtaining a complete characterisation of the repetitive portion of the olive genome that will be useful for a precise annotation of the genome sequence, we analysed the presence of SINEs. We searched for putative full-length SINEs, with different structures, in a set of Roche-454 reads of relevant length, using a bioinformatics tool for the modules detection and a pipeline for manual validation of sequences. This analysis allowed us to perform the first targeted identification and characterisation of tRNA-derived SINEs of O. europaea. The pipeline designed for SINE identification can be used for every plant species, even in the absence of a reference genome and will allow wider analyses on this class of repetitive DNA in plants.

Materials and methods

SINEs characterisation

The identification of full-length SINEs in O. europaea subsp. europaea var. sativa, cv. Leccino, was performed using the SINEFinder tool based on searching structural features of SINEs (Wenke et al. 2011) on olive 454 sequences available on the NCBI-SRA website (http://www.ncbi.nlm.nih.gov/sra, SRX474079). We trimmed 454 reads for quality with default setting and checked for adapters using CLC-BIO Genomic Workbench 7.0.4 (CLC-BIO, Aarhus, Denmark). We obtained 8079,610 single reads, with mean read lengths of 407 nt, corresponding to 2.3 genome equivalents. A subset of 1718,887 reads was prepared selecting reads longer than 400 nt, using an internally developed Perl script.

Structural features of plant SINEs include a tRNA origin, an internal polymerase III promoter (made of A and B boxes) in their 5′ tRNA-related region, a tRNA-unrelated region of variable length, a short stretch of T or A at their 3′-end and the presence of flanking direct repeats (Target Site Duplication, TSD). These key structural features were used by a search algorithm (SINEFinder) to identify the putative SINEs, applying default parameters. After isolation, the SINEs were manually curated to verify the occurrence of the TSD and submitted to BLASTN at the NCBI website (http://www.ncbi.nlm.nih.gov/), with E <10^–10, and to RepeatMasker (Jurka 2000, version open -4.0.5) against Repbase (http://www.girinst.org/repbase/index.html), and against databases of DNA repeats of olive (Barghini et al. 2014), rice (Copetti et al. 2015), sunflower (Natali et al. 2013), and Posidonia oceanica (Barghini et al. 2015b), to exclude those sequences sharing similarity with other repeat classes, as retrotransposons and miniature inverted-repeat transposable elements. Only elements with verified TSDs and showing similarity with SINEs of other species or no similarity with other repeats were retained.

Target site duplications were then manually removed from each sequence. Finally, all isolated SINEs were subjected to BLASTClust analysis performed using stringent parameters (-p F -L 1 -b T -S 100 -a 4), to produce a nonredundant sequence set.

Estimation of sequence abundance and conservation

The abundance of SINEs in the olive genome sequence was estimated by mapping a large Illumina sequence read set available from the NCBI-SRA website (SRX465835) to the dataset of SINEs. The Illumina reads were sequenced from DNA of the very same plant of cv. Leccino, used for Roche 454 sequencing. Illumina reads were preprocessed to remove Illumina adapters using the CLC-BIO Genomic Workbench 7.0.4 as well. The tool was also used for quality trimming with default setting to remove organellar reads from the set (using an in-house olive organellar sequence database, Barghini et al. 2014) and to define the length of the reads at 50 nt. The nuclear read set included 150,000,000 reads, corresponding to 7500,000,000 nt and 5.4 genome equivalents.

For these analysis, only the 5′-ends of the putative SINEs were collected (ranging from 52 to 98 nt in length) using a script that cut the sequences 10 bases downstream the B box, to exclude low complexity regions from aligning and mapping. Mapping was performed using the CLC-BIO Genomic Workbench 7.0.4 software with the following mapping parameters: mismatch cost 1, deletion cost 1, insertion cost 1, length fraction 0.7, and similarity 0.7 (Barghini et al. 2014). Since CLC-BIO mapping algorithm distributes multi-reads randomly, the number of mapped reads to a single sequence is simply an indication and not a precise quantification of its abundance.

In other analyses, mapping onto all isolated SINEs (deprived of the low complexity region at 3′-end, as above) and to a sample of 254 full-length LTR retrotransposons (Barghini et al. 2015a) was performed using CLC-BIO Genomic Workbench 7.0.4, with parameters set at different stringencies to compare the sequence conservation of different repeat classes (Natali et al. 2015). Mismatch cost, deletion cost and insertion cost were fixed at 1, and similarity and length fraction were both fixed at 0.9, 0.7, or 0.5 to obtain high, medium or low stringencies, respectively.

Analysis of SINEs positions in the genome

RepeatMasker (Smit et al. 2004) was used to evaluate the proximity of putative SINEs to genes or transposable elements in the olive genome. A library made up by 210,068 contigs composing a draft assembly of the olive genome (Barghini et al. 2014) was masked with the set of putative SINEs using the following parameters: –s –x –no_is –nolow. SINE-containing contigs were then scored for the concurrent presence of genic or repeated sequences.

Phylogenetic analyses

The 5′-ends of the SINEs deprived of low complexity regions as above were aligned using CLUSTALX (http://www.ebi.ac.uk/Tools/msa/clustalo/), and a phylogenetic tree was generated and visualised using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

Identification of modules

The SINESearch tool was used for the identification of SINE families and modules. SINESearch is based on using FASTA files and simple parameters such as overlap length and sequence identity that are biologically sensible for detecting SINE modules (Vassetzky and Kramerov 2013).

SINESearch offers four banks for searching SINE similarities. We used minimum sequence identity of 65 %, minimum overlap length of 90 % of the query sequence length as a starting point against SINEBank and minimum sequence length of 60 nt, as recommended by the authors, for the search against the Plant tRNABank and, subsequently, against the RNABase bank. The first identified tRNA was used for classification.

Results

We searched putative SINEs in 1718,887 454 reads (longer than 400 nt) of olive genomic DNA and collected 2996 sequences. These sequences were manually validated and subjected to BLASTN analysis against NCBI databases and Repbase to remove false positive and to BLASTClust analysis to produce a nonredundant dataset of SINEs. At the end of the process, we retained a total of 227 unique putative SINEs constituting the OLEASINE dataset. Their sequences are available at the Department of Agriculture, Food, and Environment of Pisa University repository website (http://www.agr.unipi.it/ricerca/plantgenetics-and-genomics-lab/sequence-repository).

The isolated full-length elements covered 54,555 nt on a total of 952,122,442 sequenced nucleotides. The length of the putative SINEs identified in this work ranges from 140 to 362 bp, with a mean length of 240.0 bp. Figure 1 shows the SINE length distribution in OLEASINE. Two peaks can be observed in the length distribution, the first around 212 bp and the second around 262 bp.

The sequence similarity of the putative SINEs to other elements already described in other species was determined using the online tool SINESearch. The composition of OLEASINE is reported in Fig. 2. The vast majority of these SINEs, 222 out of 227 elements, show similarity to only four types, of which two belong to Solanaceae (SolS-IIIa and SolS-IIIb, identified in Solanum tuberosum and Nicotiana benthamiana, respectively), one to Scrophulariaceae (ScroS-I) and one to Populus trichocarpa (PTr-2).

A schematic representation of the structure of the major SINE types identified in our study is reported in Fig. 3. One of the most distinctive features of the SINE structure is the presence of a sequence with high similarity to tRNA sequence, which is necessary for SINE transcription because of the presence of an internal promoter, recognised by RNA polymerase III. A search for tRNA sequences was positive in 209 out of 227 putative SINEs. In 209 tRNA-carrying SINEs, tRNAs related to 16 amino acids were found (Fig. 4), the most common tRNAs being those of tyrosine, cysteine and asparagine.

In another analysis, the abundance of each putative SINE in the olive genome was measured by mapping a large set of Illumina reads to the putative SINE sequences. Assuming that Illumina reads in our experiments are sampled without bias for particular sequence types, mapping the reads onto a set of sequences provides a method of estimating the abundance of any genomic sequence of the set (Swaminathan et al. 2007; Tenaillon et al. 2011; Natali et al. 2013).

In our analyses, putative SINEs were previously cut 10 nt downstream the B module to exclude the low complexity region that can be subjected to uncorrected alignment. A total of 5.4 genome equivalents of Illumina reads (150,000,000 nuclear reads, 50 nt in length) were mapped onto the OLEASINE. We used such relatively short reads because the mean length of sequences to be mapped was 72.3 nt. Overall, only 46,143 reads (0.03 %) matched the OLEASINE sequences, with a mean average coverage of 116.28. Although CLC-BIO mapping algorithm distributes multi-reads randomly, the number of multi-reads resulted lower than 1 % (data not shown); hence it did not affect significantly abundance values of each element.

The coverage distribution of the whole set of putative SINEs is reported in Fig. 5. Having mapped 5.4 genome equivalents, the expected average coverage for a single copy sequence was 5.4. Consequently, SINEs with average coverage ranging from 0 to 21.6 should correspond to single or low copy number (1–4) elements; sequences with average coverage between 21.6 and 540 should be medium repeated in the genome (4–100 copies), and those with average coverage higher than 540 should be highly repeated.

Overall, the majority of identified SINEs (211/227) were present as lowly or medium repeated elements. Only five putative SINEs were highly repeated in the olive genome; the most repeated one showed an average coverage of 1211.8, grossly corresponding to 224 copies per genome.

Plant SINEs are mainly dispersed randomly in genomes, although they are rarely present in heterochromatic, pericentromeric regions and have a preference for gene-rich regions (Wenke et al. 2011). To evaluate this aspect in the olive genome, we searched sequences with similarity to putative olive SINEs in the 210,068 contigs which compose a draft assembly of the olive genome (Barghini et al. 2014) and found 920 SINE-containing contigs; then, we scored each contigs for similarity to genic or non-genic sequences. The occurrence of putative gene and transposon-related sequences in SINE-flanking regions is summarised in Table 1. In many cases (184 plus 65 contigs, 27.1 % in total), the putative SINE was adjacent to gene sequences. Two-hundred-fifty plus 65 contigs (34.3 %) were close to retrotransposon fragments. Only 14 contigs (1.5 %) were flanked by DNA transposon fragments. It is to be considered that transposons account for 45.7 % of the olive genome and genes for less than 10 % (Barghini et al. 2014).

Table 1 Occurrence of sequences with similarity to genes and transposable elements (retrotransposons and DNA transposons) in the 920 SINE-containing contigs of the Whole Genome Set of Assembled Sequences obtained by Barghini et al. (2014)

Full size table

To investigate the relationship among the olive SINEs, the 5′ termini of the elements as above (including A and B boxes, and ranging from 52 to 98 nt in length) were aligned (Supplementary Material # 1) to produce a distance tree (Fig. 6). We used the 5′ terminus instead of full-length elements, because the 3′-ends of SINEs show low complexity and this can produce uncorrect alignments; moreover, the 5′ tRNA-related region of SINEs is the most conserved, carrying the RNA polymerase III promoter needed for element replication. The phylogenetic tree (Fig. 6) shows the occurrence of six significantly distinct groups, of which four are composed by only one element and two include 135 (family A) and 88 (family B) elements, respectively.

To gain insight of SINE dynamics during olive genome evolution lineages, we analysed SINE sequence conservation in comparison to that of LTR-retrotransposon lineages. In Fig. 7, we report the relative number of Illumina reads matching to the different repeat families applying different stringency parameters. Relaxing stringency parameters implied an increase in the number of mapped reads. The ratio between the abundance calculated at medium and high stringency (see “Materials and methods”) for a given lineage should indicate the degree of sequence conservation of the elements belonging to that lineage. Comparing two lineages, such that the lower the ratio, the higher the sequence conservation. SINE sequences are by far less conserved than retrotransposon, with the SINE family B (including poplar-like elements) less conserved than family A (whose elements show similarity to Solanaceae and Scrophulariaceae).

With regard to their abundance in the olive genome, the mean average coverage of A family was 84.8, whereas that of B family was 149.83, i.e. near twofold than that of the A family.

Discussion

SINEs have been well studied in animal genomes, in which they form abundant sequence families. However, they have been characterised only in a limited number of plant species (Umeda et al. 1991; Yoshioka et al. 1993; Deragon et al. 1994; Borodulina and Kramerov 1999; Yasui et al. 2001; Gadzalski and Sakowicz 2011; Wenke et al. 2011; Seibt et al. 2016).

In this work, we have identified and described 227 putative SINEs of olive tree. Such sequences form a small portion of the olive genome. The size of the putative SINEs identified in this work are similar to those typically found in plant SINEs (Deragon and Zhang 2006; Kramerov and Vassetzky 2011). Length distribution reflects the occurrence of two distinct major groups of SINEs.

Analysis of sequence similarity showed the occurrence of only four types, already characterised in Solanaceae (two types), in Scrophulariaceae and in Salicaceae. Almost all presented a sequence with high similarity to tRNA sequences, especially those of tyrosine, cysteine and asparagine. Tyrosine- and cysteine-tRNA related sequences have already been found in SINEs of Populus trichocarpa (Wenke et al. 2011).

Mapping Illumina reads to the putative SINEs was performed to estimate the abundance of SINEs in the olive genome. The resulting low average coverage (0.03 % of the genome) is comparable to that already reported by Barghini et al. (2014) indicating that SINEs plus tRNA genes account for only 0.046 % of the olive genome, a value in the range of that reported in numerous plant species (Wenke et al. 2011). No SINE showed very high abundance in the olive genome, as already observed in other plant species (Wenke et al. 2011).

Olive SINEs resulted located near or within protein-encoding genes at high frequency, compared to other retrotransposons or DNA transposons. These data confirm that SINEs are preferentially associated to genes, similar to that reported also for other plant species, for example, Triticum aestivum (Ben-David et al. 2013), in which approximately 38 % of the Au SINEs are inserted into or adjacent to transcribed regions.

Even if the occurrence of other SINEs in the genome cannot be ruled out, phylogenetic analyses of the 5′ termini of olive SINEs showed the occurrence of two major groups (beside to other four including only one element), that we called families A (related to Solanaceae and Scrophulariaceae SINEs) and B (related to Salicaceae SINEs). These two families showed different sequence conservation, elements of family A being more conserved than those of family B. Moreover, SINE sequences resulted by far less conserved than those of LTR retrotransposons.

Although sequence similarity-based dating can be subject to reservation when inferring over long evolutionary periods, these data suggest that SINE amplification in olive was more ancient than that of LTR retrotransposons, allowing to accumulate more mutations in SINEs than in LTR retrotransposons. Interestingly, Solanaceae and Scrophulariaceae are phylogenetically closer than Salicaceae (the poplar family) to Oleaceae (Judd and Olmstead 2004), in agreement with higher conservation of SINEs of family A than family B. Alternatively, higher sequence diversity in SINEs compared to LTR retrotransposons might be related to the different retrotranscriptases generally involved in reverse transcription of SINEs and LTR retrotransposons. It is known, for example, that reverse transcriptases from oncoretroviruses, lentiviruses, non-LTR and LTR retrotransposons are differently prone to errors during reverse transcription (Jamburuthugoda and Eickbush 2011), probably resulting in a different sequence conservation of these repeat classes.

The difference in abundance between elements of the A and B family (the mean average coverage of B family being near twofold than that of the A family), coupled with the lower sequence conservation of members of the B family, should indicate that most B SINEs amplification has occurred earlier than that of A SINEs. Given their similarity to elements of Salicaceae, it might be argued that in olive tree progenitors, SINE expansion has occurred especially before separation between rosids (to which Solanaceae, Scrophulariaceae and Oleaceae belong) and asterids (to which Salicaceae belong), i.e. the two olive SINE families should have originated at two different times during evolution. Once a precise and detailed olive genome sequence will be available, analysis of SINE loci will allow inferring on both mechanism and function of SINE amplification in this species.

In conclusion, although manual validation of the identified elements was necessary (only 227 out of 2996 elements were maintained), the results show that our strategy is appropriate for SINE identification and will favour further analyses on these relatively unknown elements to be performed in other plant species, even in the absence of a sequenced genome. Functional analyses of putative SINEs will contribute further clarification on the specific mechanisms of SINE replication in plants and the possible reason for their relative paucity in plant genomes.

References

Barghini E, Natali L, Cossu RM, Giordani T, Pindo M, Cattonaro F et al (2014) The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome. Genome Biol Evol 6:776–791
Article PubMed PubMed Central Google Scholar
Barghini E, Natali L, Giordani T, Cossu RM, Scalabrin S, Cattonaro F et al (2015a) LTR retrotransposon dynamics in the evolution of the olive (Olea europaea) genome. DNA Res 22:91–100
Article CAS PubMed Google Scholar
Barghini E, Mascagni F, Natali L, Giordani T, Cavallini A (2015b) Analysis of the repetitive component and retrotransposon population in the genome of a marine angiosperm, Posidonia oceanica (L.) Delile. Mar Genom 24:397–404
Article Google Scholar
Ben-David S, Yaakov B, Kashkush K (2013) Genome-wide analysis of Short Interspersed Nuclear Elements SINES revealed high sequence conservation, gene association and retrotranspositional activity in wheat. Plant J 76:201–210
CAS PubMed PubMed Central Google Scholar
Bitonti MB, Cozza R, Chiappetta A, Contento A, Minelli S, Ceccarelli M et al (1999) Amount and organization of the heterochromatin in Olea europaea and related species. Heredity 83:188–195
Article CAS PubMed Google Scholar
Boeke JD (1997) LINEs and Alus—the polyA connection. Nature Genet 16:6–7
Article CAS PubMed Google Scholar
Borodulina OR, Kramerov DA (1999) Wide distribution of short interspersed elements among eukaryotic genomes. FEBS Lett 457:409–413
Article CAS PubMed Google Scholar
Churakov G, Sadasivuni MK, Rosenbloom KR, Huchon D, Brosius J, Schmitz J (2010) Rodent evolution: back to the root. Mol Biol Evol 27:1315–1326
Article CAS PubMed Google Scholar
Contento A, Ceccarelli M, Gelati MT, Maggini F, Baldoni L, Cionini PG (2002) Diversity of Olea genotypes and the origin of cultivated olives. Theor Appl Genet 104:1229–1238
Article CAS PubMed Google Scholar
Copetti D, Zhang J, El Baidouri M, Gao D, Wang J, Barghini E et al (2015) RiTE database: a resource database for genus-wide rice genomics and evolutionary biology. BMC Genom 16:538
Article Google Scholar
Cruz F, Julca I, Gómez-Garrido J, Loska D, Marcet-Houben M, Cano E et al (2016) Genome sequence of the olive tree, Olea europaea. GigaScience 5:29
Article PubMed PubMed Central Google Scholar
Deragon JM, Zhang XY (2006) Short interspersed elements (SINEs) in plants: origin, classification, and use as phylogenetic markers. Syst Biol 55:949–956
Article PubMed Google Scholar
Deragon JM, Landry BS, Pélissier T, Tutois S, Tourmente S, Picard G (1994) An analysis of retroposition in plants based on a family of SINEs from Brassica napus. J Mol Evol 39:378–386
Article CAS PubMed Google Scholar
Dewannieux M, Esnault C, Heidmann T (2003) LINE-mediated retrotransposition of marked Alu sequences. Nat Genet 35:41–48
Article CAS PubMed Google Scholar
Ferrigno O, Virolle T, Djabari Z, Ortonne JP, White RJ, Aberdam D (2001) Transposable B2 SINE elements can provide mobile RNA polymerase II promoters. Nat Genet 28:77–81
CAS PubMed Google Scholar
Gadzalski M, Sakowicz T (2011) Novel SINEs families in Medicago truncatula and Lotus japonicus: bioinformatic analysis. Gene 480:21–27
Article CAS PubMed Google Scholar
Galli G, Hofstetter H, Birnstiel ML (1981) Two conserved sequence blocks within eukaryotic tRNA genes are major promoter elements. Nature 294:626–631
Article CAS PubMed Google Scholar
Gogolevsky KP, Vassetzky NS, Kramerov DA (2008) Bov-B-mobilized SINEs in vertebrate genomes. Gene 407:75–85
Article CAS PubMed Google Scholar
Gogolevsky KP, Vassetzky NS, Kramerov DA (2009) 5S rRNA-derived and tRNA-derived SINEs in fruit bats. Genomics 93:494–500
Article CAS PubMed Google Scholar
Jamburuthugoda VK, Eickbush TH (2011) The reverse transcriptase encoded by the non-LTR retrotransposon R2 is as error-prone as that Encoded by HIV-1. J Mol Biol 407:661–672
Article CAS PubMed PubMed Central Google Scholar
Judd WS, Olmstead RG (2004) A survey of tricolpate (eudicot) phylogenetic relationships. Am J Bot 91:1627–1644
Article PubMed Google Scholar
Jurka J (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet 16:418–420
Article CAS PubMed Google Scholar
Kajikawa M, Okada N (2002) LINEs mobilize SINEs in the eel through a shared 39 sequence. Cell 111:433–444
Article CAS PubMed Google Scholar
Kapitonov VV, Jurka J (2003) A novel class of SINE elements derived from 5S rRNA. Mol Biol Evol 20:694–702
Article CAS PubMed Google Scholar
Katsiotis A, Hagidimitriou M, Douka A, Hatzopoulos P (1998) Genomic organization, sequence interrelationship, and physical localization using in situ hybridization of two tandemly repeated DNA sequences in the genus Olea. Genome 41:527–534
Article CAS PubMed Google Scholar
Kramerov DA, Vassetzky NS (2011) Origin and evolution of SINEs in eukaryotic genomes. Heredity 107:487–495
Article CAS PubMed PubMed Central Google Scholar
Lenoir A, Lavie L, Prieto JL, Goubely C, Coté JC, Pélissier T, Deragon JM (2001) The evolutionary origin and genomic organization of SINEs in Arabidopsis thaliana. Mol Biol Evol 18:2315–2322
Article CAS PubMed Google Scholar
Lorite P, Garcia MF, Carrillo JA, Palomeque T (2001) A new repetitive DNA sequence family in the olive (Olea europaea L.). Hereditas 134:73–78
Article CAS PubMed Google Scholar
Loureiro J, Rodriguez E, Costa A, Santos C (2007) Nuclear DNA content estimations in wild olive (Olea europaea L. ssp. europaea var. sylvestris Brot.) and portuguese cultivars of O. europaea using flow cytometry. Genet Res Crop Evol 54:21–25
Article CAS Google Scholar
Minelli S, Maggini F, Gelati MT, Angiolillo A, Cionini PG (2000) The chromosome complement of Olea europaea L.: characterization by differential staining of the chromatin and in situ hybridization of highly repeated DNA sequences. Chromosome Res 8:615–619
Article CAS PubMed Google Scholar
Natali L, Giordani T, Buti M, Cavallini A (2007) Isolation of Ty1-Copia putative LTR sequences and their use as a tool to analyse genetic diversity in Olea europaea. Mol Breed 19:255–265
Article CAS Google Scholar
Natali L, Cossu RM, Barghini E, Giordani T, Buti M, Mascagni F et al (2013) The repetitive component of the sunflower genome as revealed by different procedures for assembling next generation sequencing reads. BMC Genom 14:686
Article CAS Google Scholar
Natali L, Cossu RM, Mascagni F, Giordani T, Cavallini A (2015) A survey of Gypsy and Copia LTR-retrotransposon superfamilies and lineages and their distinct dynamics in the Populus trichocarpa (L.) genome. Tree Genet Genomes 11:107
Article Google Scholar
Okada N (1991) SINEs. Curr Opin Genet Dev 1:498–504
Article CAS PubMed Google Scholar
Perna NT, Batzer MA, Deininger PL, Stoneking M (1992) Alu insertion polymorphism—a new type of marker for human-population studies. Hum Biol 64:641–648
CAS PubMed Google Scholar
Rowold DJ, Herrara RJ (2000) Alu elements and the human genome. Genetica 108:57–72
Article CAS PubMed Google Scholar
Schramm L, Hernandez N (2002) Recruitment of RNA polymerase III to its target promoters. Genes Dev 16:2593–2620
Article CAS PubMed Google Scholar
Schwichtenberg K, Wenke T, Zakrzewski F, Seibt KM, Minoche A, Dohm JC et al (2015) Diversification, evolution and methylation of short interspersed nuclear element families in sugar beet and related Amaranthaceae species. Plant J 85:229–244
Article Google Scholar
Seibt KM, Wenke T, Muders K, Truberg B, Schmidt T (2016) Short Interspersed Nuclear Elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization. Plant J. doi:10.1111/tpj.13170)
Google Scholar
Singer MF (1982) Highly repeated sequences in mammalian genomes. Int Rev Cytol 76:67–112
Article CAS PubMed Google Scholar
Smit AFA, Hubley R, Green P (2004) RepeatMasker. Open-3.0. http://www.repeatmasker.org/
Stergiou G, Katsiotis A, Hagidimitriou M, Loukas M (2002) Genomic and chromosomal organization of Ty1-Copia-like sequences in Olea europaea and evolutionary relationships of Olea retroelements. Theor Appl Genet 104:926–933
Article CAS PubMed Google Scholar
Sun FJ, Fleurdépine S, Bousquet-Antonelli C, Caetano-Anolle’s G, Deragon JM (2007) Common evolutionary trends for SINE RNA structures. Trends Genet 23:26–33
Article PubMed Google Scholar
Swaminathan K, Varala K, Hudson ME (2007) Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey. BMC Genom 8:132
Article Google Scholar
Tenaillon MI, Hufford MB, Gaut BS, Ross-Ibarra J (2011) Genome size and transposable element content as determined by high-throughput sequencing in maize and Zea luxurians. Genome Biol Evol 3:219–229
Article CAS PubMed PubMed Central Google Scholar
Tsuchimoto S, Hirao Y, Ohtsubo E, Ohtsubo H (2008) New SINE families from rice, OsSN, with poly(A) at the 3′ ends. Genes Genet Syst 83:227–236
Article CAS PubMed Google Scholar
Ullu E, Tschudi C (1984) Alu sequences are processed 7SL RNA genes. Nature 312:171–172
Article CAS PubMed Google Scholar
Umeda M, Ohtsubo H, Ohtsubo E (1991) Diversification of the plant Short Interspersed Nuclear Elements 11 of 12 rice waxy gene by insertion of mobile DNA elements into introns. Jpn J Genet 66:569–586
Article CAS PubMed Google Scholar
Vassetzky NS, Kramerov DA (2013) SINEBase: a database and tool for SINE analysis. Nucleic Acids Res 41:D83–D89
Article CAS PubMed Google Scholar
Wenke T, Dobel T, Sorensen TR, Junghans H, Weisshaar B, Schmidt T (2011) Targeted identification of short interspersed nuclear element families shows their widespread existence and extreme heterogeneity in plant genomes. Plant Cell 23:3117–3128
Article CAS PubMed PubMed Central Google Scholar
Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982
Article CAS PubMed Google Scholar
Yasui Y, Nasuda S, Matsuoka Y, Kawahara T (2001) The Au family, a novel short interspersed element (SINE) from Aegilops umbellulata. Theor Appl Genet 102:463–470
Article CAS Google Scholar
Yoshioka Y, Matsumoto S, Kojima S, Ohshima K, Okada N, Machida Y (1993) Molecular characterization of a short interspersed repetitive element from tobacco that exhibits sequence homology to specific tRNAs. Proc Natl Acad Sci USA 90:6562–6566
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This study was funded by MiPAAF, Italy, Project “OLEA—Genomica e miglioramento genetico dell’olivo”.

Author information

Authors and Affiliations

Department of Agriculture, Food and Environment, University of Pisa, Via del Borghetto 80, I-56124, Pisa, Italy
Elena Barghini, Flavia Mascagni, Lucia Natali, Tommaso Giordani & Andrea Cavallini

Authors

Elena Barghini
View author publications
You can also search for this author in PubMed Google Scholar
Flavia Mascagni
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Natali
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Giordani
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Cavallini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Cavallini.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by S. Hohmann.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 115 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barghini, E., Mascagni, F., Natali, L. et al. Identification and characterisation of Short Interspersed Nuclear Elements in the olive tree (Olea europaea L.) genome. Mol Genet Genomics 292, 53–61 (2017). https://doi.org/10.1007/s00438-016-1255-3

Download citation

Received: 14 May 2016
Accepted: 03 October 2016
Published: 06 October 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s00438-016-1255-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Identification and characterisation of Short Interspersed Nuclear Elements in the olive tree (Olea europaea L.) genome

Abstract

Similar content being viewed by others

Low coverage sequencing for repetitive DNA analysis in Passiflora edulis Sims: citogenomic characterization of transposable elements and satellite DNA

Genome-wide analysis of LTR-retrotransposon diversity and its impact on the evolution of the genus Helianthus (L.)

Comparative analysis of miniature inverted–repeat transposable elements (MITEs) and long terminal repeat (LTR) retrotransposons in six Citrus species

Introduction