Introduction

Ribosomal DNA (rDNA) is encoded by genes present in multiple copies organized in tandemly repeated clusters. In the genome of eukaryotes, each cluster consists of the 18S, 5.8S and 28S coding regions, separated by two internal transcribed spacers, ITS1 and ITS2. “Concerted evolution” is believed to maintain intragenomic similarities among copies of these sequences by mechanisms such as gene conversion and crossing over [22]. However, the gain or loss of repeats and the accumulation of mutations with the continual turnover of DNA results in high levels of nucleotide and insertion-deletion polymorphisms within and among populations [10]. As a result, most rDNA genes have proven useful markers for phylogenetic inferences at the species, genus and family levels.

The ITS1 gene has in general been excluded from phylogenetic applications due to its potential for high variations in length [3]. Nonetheless, its rapid divergence rate may help delineate cryptic and incipient species. In arthropods, ITS1 size is highly variable. The largest sizes were reported in insects. For example, Anopheles punctulatus individuals contain multiple ITS1 variants that range from 1.2 to 8.0 kb [1]. In some other taxa, ITS1 length variation was observed due to three to six short internal repeats [2, 14, 35], or to longer repeated sequences [28]. In crustaceans, ITS1 lengths have been reported to range from 182 to 820 bp [3]. One exception, are the ITS1 sequences in crayfishes, the longest found thus far, which are about 1300 bp long due to a large number of single sequence repeats [7, 12].

The secondary structure of ITS1 plays an important role in defining the split sites, which allow the release of the rRNA molecules during the maturation process [23]. These structures are functionally important and, as a result, are thought to be well conserved [4]. Due to these evolutionary constraints on the split sites, it was postulated that the remainder of the sequences might evolve near the neutral level [34]. In most eukaryotes, ITS1 consists of an open multibranch loop with several helices [8]. Long hairpins are formed by tandem repeats paired with one another with a conserved motif in the terminal loop.

In this paper we first characterize the size range of ITS1 in Alpheidae and in other Caridea. We discuss the consequence of the presence or absence of repeat elements on the ITS1 structure. We then compare the evolutionary rate of COI and ITS1 among sympatric cryptic species of the ecologically important Alpheus lottini species complex explicitly including the effect of differences in the number of repeat units. Finally, we estimate the date of apparition of these events of divergence between cryptic clades and species taking into account the nucleotide substitution rate.

Materials and methods

Material

Crangon crangon, Athanas nitescens, Palaemon elegans, Palaemon serratus, Hippolyte varians and Atyaephyra desmarestii were collected in 2004 in Concarneau, Brittany. Chorocaris chacei, Mirocaris fortunata and Rimicaris exoculata were collected by D. Desbruyeres in 2007 on the Mid Atlantic Ridge (2400 m depth). Plesionika hsuehuyi was collected by B. Richer de Forges in 1993 in New Caledonia (camp. Bathus3). Sergia robusta and Alpheus macrocheles were collected by P. Noel on the Atlantic coast of Spain in 2000 and Stylodactylus libratus by R. Cleva in the Marquesas (France) in 1983. Acanthephyra pelagica and Ephyrina figuerai were collected by S. Iglesias in 2000 in the Bay of Biscay (depth 1250 m, La Croix Morand). We collected Alpheus lottini in the Indian Ocean (La Réunion) and in the Pacific Ocean, around the French Polynesian islands. Other samplings sources were indicated in Table 1. DNA extracts from Alpheus cristulifrons, Alpheus cylindricus (eastern Pacific), Alpheus vanderbilti, Alpheus utriensis (Caribbean) Alpheus dentipes, Alpheus sulcatus, Alpheus floridanus, Alpheus rostratus and Alpheus hebes were provided by N Knowlton and ST Williams.

Table 1 Collecting locality and provenance of Alpheus lottini specimens used in this study

DNA extractions, PCR and sequencing

DNA was extracted from 1 pleopod using the CTAB method and polymerase chain reactions (PCR) were performed on 0.1 µg of DNA. Two primers were designed for ITS1 amplification: ITS1FW (5′-CACACCGCCCGTCGCTACTA-3′) located at the 3′end of 18S and ITS3R (5′-TCGACSCACGAGCCRAGTGATC-3′) located at the 5′ end of 5.8S rDNA. Three internal primers were used to finalize the longest ITS1 sequences: ITS5 (5′- GCACCTCAGAAGAGAACCATG-3′), ITS24R (5′-GAAGCGGGGTTCCCTCACAC-3′) and ITS30R (5′-CTGTGGTGGGCTCCAACCCT-3′). Two additional primers, ITS9 (5′GCCAATGCCCCAGGTGGGGTCA-3′) and ITS3R, were used to amplify the end of the molecule. Regarding COI, the primer combination COIF (5′-CCAGCTGGAGGAGGAGAYCC-3′) and H7188 (5′-CATTTAGGCCTAAGAAGTGTTG-3′) was used [36]. All PCR reactions were done according to the GE Healthcare protocol (Ready to Go PCR) at 52 °C. Sequencing reactions were performed on purified PCR products with BigDye® sequencing reagents (Applied Biosystems™). The initial phase of denaturation (2 min at 96 °C) was followed by 40 cycles at 96 °C for 30 s, 50 °C for 30 s and 72 °C for up to 4 min. The DNA sequence was determined on an automated ABI3130 sequencer. In one case, ITS1 fragments were ligated into pGEM®-T Easy Vector (Promega) and transformed into Escherichia coli JM109 competent cells before being cloned and sequenced. Four colonies were extracted and the insert sequenced.

Data analysis

DNA sequences were aligned using Bioedit’s ClustalW accessory application [11] and edited manually for ITS1. The evolutionary history was inferred by using the Maximum Likelihood method based on the Kimura-2-parameter distance [15] for COI and the Tamura-Nei method [27] for ITS1. This best model was selected using MEGA7 [18] with gamma distribution G = 0.65 and estimated invariable sites = 0.50. Bayesian posterior probabilities were also assessed using Bayesian Markov Chain Monte Carlo (MCMC) analysis in MrBayes (vers. 3.2.4). The phylogenetic tree reconstruction was computed with branch lengths proportional to the number of substitutions per site and was represented with both ML bootstrap support (BS) and Bayesian posterior probabilities (PP) values. The phylogenetic analysis involved 52 sequences for ITS1. All positions containing either gaps or missing data were eliminated.

Prediction of structural domains and motifs

Structural RNA folding elements were recognized with the help of mFOLD [13, 39] that screens for thermodynamically optimal secondary structures. Default values were chosen to fold the ITS1 sequences and the folding procedure was reiterated to obtain optimal energies using default conditions. Change in temperature setting (T = 25 °C) did not affect the general architecture but did result in lower energy levels for secondary structure as suggested by Ki and Han [14].

Results

Variation of ITS1 size in Caridea

ITS1 sequences from 27 caridean species belonging to 9 different families were analysed (Table 2). The simplest organization was shown in Alvinocaridae, Stylodactylidae, Hippolithidae, some Alpheidae (i.e., Athanas nitescens and Alpheus macrocheles), Penaeidae (i.e., Litopenaeus vannamei) and in most Caridea who presented no repeat units, minisatellites or microsatellites. In other shrimps (Crangon, Acanthephyra, Ephyrina), microsatellites were detected and the size of the ITS1 ranged between 619 bp and 1179 bp. Although less common, repeat units were identified in the Caridae, Pandalidae and Palaemonidae. Two repeats were present, either at the 5′end of ITS1 (Palaemon elegans) or at the 3′ end (P. serratus or Macrobrachium rosenbergii). They were more common among Alpheidae, with up to four repeats in Alpheus sulcatus, A. cristulifrons, A. lottini and A. vanderbilti with a size ranging from 995 to 1915 bp. Sequence similarity among repeats was highly variable, ranging from 50 to 98%. The highest levels were reported in A. rostratus (91%) and in A. floridanus with up to 98% identity between repeat 1 and 2, whereas A. cristulifrons and A. sulcatus showed the lowest levels of similarity (Table 2). In A. lottini species complex, the size of ITS1 ranged from 1597 to 1624 bp with numerous gaps or insertions for specimens from Fangataufa (Fig. 1).

Table 2 Characteristics of ITS1 across Caridea
Fig. 1
figure 1

Alignment of ITS1 sequences of the five Alpheus lottini cryptic species: Reunion 2 corresponds to Clade A2, Rangiroa 4 to clade A1, Bora Bora M1 to clade B1 Fangataufa to clade C and Moorea A11 to clade B2. Gaps are bordered

Repeat units and ITS1 structure in Caridea

All ITS1 sequences present helices and stems. The simplest structures were reported in A. macrocheles (Fig. 2a) or in Ephyrina, Stylodactylus and in Rimicaris (not shown), with four helices and stems. No sequence identities are reported within these stems and loops. Repeat units give either extra loops or mirror loops. These additional helices are present at different positions depending on thermodynamic constraints. Regarding the location of repeat units, no fixed position for repeat units could be determined. Two to three loops (D1-1 to D1-3), corresponding to two or three repeats were present at the beginning of ITS1 in A. rostratus (Fig. 2e), in A. cylindricus (Fig. 2f) or in A. floridanus (Fig. 2b). In A. cristulifrons (Fig. 2c), the repeat units, present in the middle of the spacer, give two extra loops at the same positions. In A. lottini, duplication occurred in the middle of the spacer with two alternate extra loops (D2-2 and D3-2) presenting some kind of symmetrical structure (Fig. 2d). In A. sulcatus (Fig. 2g), although repeats presented very low identities, two extra loops were found. In P. serratus (Fig. 2h), where the repeats are located at two different positions, we noticed the presence of two extra loops at two different positions (D2-1 and D2-2).

Fig. 2
figure 2

Analysis of the secondary structure of ITS1 in selected crustacean: Hairpin structures were formed at the lowest free energy values: the optimal folding was selected according to the revised energy rules that tend to contain more “correct” base pairs. aAlpheus macrocheles, bA. floridanus, cA. cristulifrons, dA. lottini, eA. rostratus, fA. cylindricus, gA. sulcatus, hPalaemon serratus

Conserved last ITS1 stems and phylogeny of Alpheidae

The 3′end ITS1 stems were aligned and compared between Alpheidae. Pairwise sequence identities, ranging from 0.602 to 0.976%, were measured between all species except with the more divergent A. sulcatus (Table 3). These last stems showed splicing sites and conserved sequences in the last helix 4 (Table 4).

Table 3 Sequence identity of the last ITS1 stems (conserved 3′ ends) between species of Alpheidae
Table 4 ITS1 splicing sites and conserved sequences (in bold) in helix 4 of the Alpheidae

Importance of ITS1 for delineation of A. lottini cryptic species and comparison of evolution rates with that of COI

The comparison of divergence rates between ITS1 repeats and COI within the A. lottini species complex (Table 5) highlighted two distinct clades (A and B; following nomenclature of [36] with two well-supported sub-clades within each major clade (A1, A2 and B1, B2; Fig. 3). In clade A, 2.1% divergence was measured between the samples of the Indian Ocean (A2) and those from the Pacific (A1) (Table 5a). Within clade B, 3% divergence was measured among Pacific sub-clades (Table 5a and Sup. 1). Specimens of both sub-clades were collected in Moorea and Bora Bora, occasionally together in the same colony of the coral Pocillopora damicornis. Levels of sequence divergence between clades A and B and specimens of the cryptic A. lottini species collected in Fangataufa (clade C) ranged from 11.5 to 12.9%.

Table 5 Computed pair wise distance between sisters’ clades of Alpheus lottini. Clades are defined in Fig. 3. (a) Up the diagonal are indicated the results obtained with the 3′ end of the ITS1, last 493 bp out of duplicated sequences, and down the diagonal with the total ITS1 sequences. (b) Pair wise distances for COI are given in comparison for the same individuals collected at the same places
Fig. 3
figure 3

Phylogeny of Alpheus lottini subspecies inferred using ITS1 sequences from maximum likelihood and Bayesian approaches. GenBank accession numbers are indicated when nucleotide sequences were downloaded from NCBI. Maximum Likelihood bootstrap values (BS; first value) and Bayesian posterior probabilities (PP; in brackets) are indicated as well as the major clades A, B and C delineated in the A. lottini species complex

These values varied from 1 (B1 vs. B2) to 3.5% (A1 vs. A2) when considering the 3′ end of ITS1 (Table 5a over diagonal). With regards to COI, levels of sequence divergence between sub-clades of clades A and B ranged from 3.1 to 8.5%. Clade C from Fangataufa was 12.6 to 13.8% divergent from clades B and A, respectively (Table 5b and Sup. 2).

Discussion

A majority of ITS1 sequences reported for crustacean in NCBI library are short. Yet, our results for a broader range of Caridea show that most taxa have long ITS1 sequences due to the presence of microsatellites or repeat units. Repeat units were very common in Alpheidae. Recent duplication (84–98% identity) occurred in A. floridanus and A. rostratus. Given the low levels of similarity between repeat units in A. sulcatus, this species may represent one of the oldest Alpheidae as suggested by Williams et al. [37]. “Old duplications” may suggest that the gene duplication arose before members underwent subsequent speciation and in the case of A. sulcatus occurred at the beginning of radiation of Alpheoidea while “recent duplications” suggest a continuous evolution. The absence of repeats over a long time period: 400–437 mya for Penaeidae and around 150–236 mya for Caridae [6] may be explained by the utmost importance of maintaining their physical structure due to higher specificity of nucleases involved in the processing of pre-ARN, opening a new field in evolution.

In inferred secondary structures, repeats form generally long hairpins, with a conserved motif in the terminal loop, and tandem repeats pair with one another over most of their length. This variability excludes in general ITS1 from phylogenetic applications [3], but suggests the general feasibility of population studies based on this marker. In most eukaryotes where this has been investigated, ITS1 consists of an open multibranch loop with several helices [8]. The presence of repeat units affects largely the number of loops and their positions. Only three loops have been detected in mosquitoes [2]. In our cases, the minimum is four (A. macrocheles) as in molluscan Pectinidae [34] and Haliotidae [30]. Up to seven ITS1 structural domains have been reported in a Trematoda [32] and six in our case. Some authors proposed some lack of constraints in the middle of the spacer [14], but it appears from our results that this lack of constraint should be extended to other domains. Either, D1 loops may be duplicated as in A. rostratus, A. cylindricus or A. floridanus or D2 loop as in P. serratus. In A. sulcatus, the two different repeats gave rise to two extra D1 and D2 loops. In A. lottini two extra loops correspond to repeats 2 and 3. In ten ladybird species (Coccinellidae) [33] a repeat was present in the middle and another one at the 5′end and the ITS1 size range from 791 to 2572 bp. In our case, only the stem and helix 4 can properly be aligned and are conserved across different families as in Boroginales [8], which suggest an important role in processing. Meanwhile, processes of rRNA cleavage involved in the release of mature rRNA remain poorly understood as well as the specificity of the different nucleases that are implicated in the processing of the 5′end of ITS1 (Lamama and Karbstein, [19] or the 3′end of ITS1 [9].

The existence of two major clades within the Alpheus lottini species complex (A and B) was confirmed with ITS1 sequences. These clades had previously been characterized based on COI with a divergence ranging from 10 to 13% (Knowlton and Weight, [16, 26, 29, 38] using samples collected in different localities (Sup. 2). Furthermore, we believe that the use of a combination of different mitochondrial and nuclear markers will allow the detection of past (due to a longer persistence of mtDNA) and recent hybrids (unpublished data on the geographical mode of speciation in A. lottini complex) as observed in other species [25]. Recently, the complexity of coral-Alpheides symbionts has been characterized by mitochondrial analysis by two of us [26]. In particular, it revealed that each of two genetically analysed lineages of A. lottini had different affinities with trapeziid crabs. Identifying cryptic species involved in these associations and their ecological roles remain challenging by using classical markers. ITS1, with its high degree of polymorphism will be very useful.

Our results confirm the presence of a complex of species that evolved separately. Alpheus sp. from Fangataufa, which is morphologically very similar to A. lottini correspond to a new clade to be studied in detail (Rouzé et al. in prep) even if it appears to be 11.5–12.9% divergent from A. lottini clades A and B based on ITS1 sequence alignments. Specimens of clade A were present in both the Indian and the Pacific Ocean (A1 and A2) and the difference between the two sub-clades is not significant [26] even if the presence of several insertions or deletions is reported (Sup. 1). It may confirm the absence of a physical separation between these Oceans at some geological periods [38]. However, it does not exclude a more recent isolation of the Indian Ocean during the Pleistocene glacial events (~ 700,000 years ago) that resulted in a 120 m drop in sea level [24]. The limited time since the physical separation has not allowed the genetic differentiation of these genes. In addition to previous results [16, 38] we characterized the existence of the two clades A and B in Moorea and Bora Bora, confirming results obtained by Williams et al. [38] using nuclear genes coding for myosin. By comparing pairwise distances between COI and ITS1 and a rate of COI evolution of 1.5% substitutions Pmy as proposed by Knowlton and Weigt [17], ITS1 substitution rate may range from 0.8 to 1.2% Pmy for A. lottini. This rate is high compared to other species [20]. For example, it ranges from 0.22 to 0.3% Pmy for Haliotis species [5] to 0.775% for bivalves [21]. These values are lower if we consider the 3′ end of ITS1 (493 bp) located outside the repeat units. We obtained a substitution rate around 0.12 to 0.5% Pmy, which suggests that the different units have evolved independently. These values are comparable to those found in other invertebrates [31]. Overall, duplication events may have occurred 2 mya for the most recent ones (98% identity) to nearly 100 mya for the oldest ones (50 to 56% identity). This indicates that divergence rates reported here for ITS1 repeat units may be overestimated and more studies are necessary to better understand their origination and their functional role in the ITS1 structure. By comparisons of the structure of the different ITS1, complex of species, recently described as sister species can be well defined such as A. vanderbilti/A. cylindricus or A. cristulifrons/A. utriensis which presented the same repeat units at a same position.

Conclusion

Repeat units should be taken into consideration when estimating the ITS1 evolution rate. Our results confirm that consensus motifs are not universal while the conservative structures, even repeated, remain important for the action of nucleases. More studies are necessary to understand the role of repeat units, which are in variable numbers and in constant evolution. ITS1 could be very useful to trace the evolution of a complex of species and establish the bonds between different species. Moreover, they can be used to identify cryptic species or hybrids involved in coral-Alpheides symbionts.