Introduction

Recent analyses of partial or complete genome sequences for various model organisms have uncovered considerable sequence conservation within intron sequences of protein-coding genes, even for comparisons as deep as 500 Mya (e.g., Margulies et al. 2003; Siepel et al. 2005). Among the factors that are correlated with amount of intronic divergence are guanosine and cytosine (GC) content, intron length and ordinal number, gene function, and nucleotide position within the intron (e.g., Hare and Palumbi 2003; Chamary and Hurst 2004; Halligan et al. 2004; Haddrill et al. 2005; Gaffney and Keightley 2006; Vinogradov 2006). Conserved intron sequences often exhibit extensive predicted secondary structure for which a putative functional role can sometimes be inferred, such as synthesis of microribonucleic acids (microRNAs), involvement in RNA editing, or control of intron splicing and/or polyadenylation (Aruscavage et al. 2000; Chen and Stephan 2003; Margulies et al. 2003; Rodova et al. 2003; Siepel et al. 2005; Pedersen et al. 2006). Chen and Stephan (2003) used site-directed mutagenesis to show that increased stability of intron 1 in the Drosophila alcohol dehydrogenase gene (i.e., a 50% increase in the length of a hairpin stem) was associated with reduced splicing efficiency and lower protein production. Other possible functional roles of intron secondary structure in messenger RNA (mRNA) genes have been reviewed by Buratti and Baraelle (2004).

With some exceptions (e.g., Hare and Palumbi 2003; Margulies et al. 2003), much less is known about conserved intron sequences in nonmodel organisms. For example, with at least 66 described genera, the sea star order Forcipulatida is one of the most taxon-rich orders of sea stars, second only to the order Valvatida, yet genomic sequences are currently available for only a small number of protein-coding genes in this order, and for only seven species. Comparative genomic sequence data for forcipulates exist only for histone genes (Cool et al. 1988). Here, a novel 140-basepair (bp) repeat sequence within two nonadjacent introns of the ATP synthase β-subunit gene from forcipulate sea stars and some related taxa is described and characterized. The repeat sequence shows extensive predicted secondary structure, evolves at about 25% of the rate of the flanking nonrepetitive intron sequences, and has apparently undergone several episodes of gene conversion. The repeat is absent from the homologous introns in several distantly related sea stars and is also absent from the sea urchin (Strongylocentrotus purpuratus) genome.

Materials and Methods

DNA was extracted from fresh, frozen, or alcohol-preserved sea star tissues using the Sigma-Aldrich GenElute kit for mammalian DNA (St. Louis, MO 63178), with final elution in sterile water. Techniques for cloning, amplifying, and sequencing these introns have been described elsewhere (Foltz et al. 2007b). Contigs were assembled in Sequencher v. 4.0 (Gene Codes Corp., Ann Arbor, MI 48108). Partial or complete sequences of one or both introns were obtained for up to 24 forcipulate species representing 19 genera [18 genera within the family Asteriidae plus one genus (Pedicellaster) in the family Pedicellasteridae], as well as the paxillosid Bathybiaster loripes, the valvatids Chitonaster johannae and Cladaster validus, the spinulosid Henricia sanguinolenta, and the velatids Cuenotaster involutus, Diplopteraster multipes, and Remaster gourdoni (Table A-1 in Supplementary Material).

Introns 5 and 7 of the ATP synthase β-subunit gene were located by BLAST searching with a sea urchin Strongylocentrotus purpuratus complementary DNA (cDNA) sequence (AY580283) query against the S. purpuratus genome (http://www.hgsc.bcm.tmc.edu/blast/blast.cgi?organism=Spurpuratus, BAC plus WGS assembly version 2.0 dated June 15, 2006). Because the sequence AY580283 is incomplete at both ends, the sea star introns were originally numbered by comparison to a complete human genomic sequence for this gene (M19482, M19483). Intron 5 corresponds to intron E in the human genome and intron 7 corresponds to intron G. The numbering was subsequently corroborated by BLAST searching with a predicted S. purpuratus ATP synthase β-subunit mRNA sequence (XM_778200.2) query against the S. purpuratus genome. Except for a few sequences for intron 6 mentioned below, there are no other sequence data available for intron or exon regions of the ATP synthase β-subunit gene in forcipulate sea stars, so the intron numbering in sea stars is tentative.

Intron sequences in nuclear protein-coding genes often show nucleotide site and/or length variation heterozygosity (LVH) within an individual (e.g., Creer et al. 2005). The various methods for phase determination for nucleotide-site heterozygosity and LVH have been reviewed by Zhang and Hewitt (2003), Kwok and Xiao (2004), and Flot et al. (2006); most of these methods require sequence data from several or large numbers of individuals within a species. In this study, 33 nucleotide sites were identified that were fixed (homozygous) for alternate nucleotides in individuals of the same or a closely related species, but that also showed putative heterozygosity (two approximately equally sized chromatogram peaks) in one or more additional individuals. The two peak heights were measured at the 33 putatively heterozygous sites, and the 95% percentile of the low/high height ratio (68%) was used in Sequencher to call heterozygosity for these sites and for other sites that lacked homozygotes among the sampled individuals. LVH was inferred when an otherwise normal chromatogram became unreadable due to superposition or mismatch of two peaks at most positions downstream from the inferred indel site, as in Palumbi and Baker (1994), Sokolova and Boulding (2004), Bhangale et al. (2005), and Creer et al. (2005). LVH heterozygotes were resolved with the method of Sousa-Santos et al. (2005), which allows allele sequences to be recovered even if one or both homozygous sequences are not present in the data set. This method works best when indels are short, the sequences have low complexity [e.g., are adenine and thymine (AT)-rich], there are few artifactual secondary peaks apart from the LVH, and multiple dispersed indels within a short region are rare. These conditions were mostly met in the present study. In particular, each of the inferred indels in the present study was 1 bp long, and there was only one instance of an inferred pair of 1 bp indels in a single sequence (see Table A-1 for details).

The ATP synthase β-subunit gene is single copy in the sea urchin Strongylocentrotus purpuratus genome [e.g., as shown by BLAST searching with a sea urchin ATP synthase β-subunit partial cDNA sequence (AY580283) query against the S. purpuratus genome]. Although comparable genomic data are not yet available for sea stars, within-individual nucleotide diversity values for each intron obtained as described above showed no evidence for inflated values. Excess diversity within individuals, compared to known single-copy nuclear genes, might be expected if the primers frequently and inadvertently amplify two or more paralogous loci (e.g., Smith et al. 2005), particularly if the divergence times of paralogous copies greatly exceed the average divergence time of within-locus allele copies. Here, within-individual diversity values for each intron were comparable to the values obtained for a second, presumably single-copy nuclear gene, elongation factor-1 α subunit, intron 4, as shown by a sign test [P > 0.05 for each intron of the ATP synthase β-subunit gene; see Foltz et al. (2007b) for further discussion]. In addition, paralogous loci might be expected to harbor indel variation between loci as well as nucleotide-site variation, leading to the expectation that chromatograms would sometimes show complex patterns of superposition of peaks, caused by the presence of three or more length variants in single individuals, that are not resolvable by the method of Sousa-Santos et al. (2005). However, each putative case of LVH observed in the present study (Table A-1) could be resolved unambiguously into two sequences, which provides further evidence against the possibility that the ATP synthase β-subunit gene is multicopy in sea stars. See the section Analysis of phylogenetic signal and rates of evolution for phylogenetic evidence against the existence of paralogous copies of this gene.

For each intron, separate alignments were generated in Clustal-X v. 1.8 (Thompson et al. 1997); these alignments (Tables A-5 and A-6 in the Supplementary Material) contained 12 species each and omitted those species with incomplete sequence reads or for which the regions that flank the repeat region could not be aligned with confidence (see Table A-1 for details on the missing data). A separate Clustal-X alignment was produced for 44 copies of the 140-bp repeat (Table A-7 in Supplementary Material).

The two full-length intron sequences for Lethasterias nanimensis were processed through the NCBI ORF-finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html), and each conceptual translation greater than 12 amino acids in any of the six potential reading frames was subjected to an unrestricted BLASTp search. The 140-bp repeat sequences for L. nanimensis were also subjected to an unrestricted BLASTn search in GenBank and were also compared via BLASTn to the sea urchin Strongylocentrotus purpuratus genome (see citation above).

Best-fit models were obtained in ModelTest v. 3.7 using the Akaike information criterion (Posada and Buckley 2004) for five sequence datasets: [1] intron 5–flanking region, [2] intron 7–flanking region, [3] intron 5–repeat region, [4] intron 7–repeat region, and [5] intron 5-repeat + intron 7-repeat. The best-fit models and parameter estimates for parametric bootstrapping are shown in Table A-4 [Supplementary Material — also see Posada (2005) for model details]. For each alignment, a heuristic search under maximum likelihood (ML) was performed in PAUP* v. 4.0b10 (Swofford 2003), starting from a neighbor-joining tree and using the best-fit model of substitution. Nonparametric bootstrap replicates (N = 200) for each set [1]–[5] were generated with the SeqBoot program in Phylip v. 3.65 (Felsenstein 2005) and analyzed in PAUP* via ML, using the corresponding best-fit model. Bootstrap values from an extended majority-rule consensus tree calculated by the Consense program in Phylip for each alignment were transferred to the original ML tree, and nodes with less than 50% support were collapsed.

Results and Discussion

Occurrence of the Repeat

The repeat was initially identified by manual comparison of separate alignments for the two introns of the ATP synthase β-subunit gene, and was found to occur in every copy of intron 5 and intron 7 that was sequenced in forcipulate sea stars. In all, 28 species were sequenced for all or part of intron 5, and 33 species were sequenced for all or part of intron 7, for a total of 148 sequence determinations (Foltz et al. 2007b). The extent of the repeat was determined as follows. For each of the 18 forcipulate species for which complete or nearly complete sequences for both introns were available, plus Bathybiaster loripes, Cuenotaster involutus, Diplopteraster multipes, Henricia sanguinolenta, and Remaster gourdoni, the two introns were aligned against each other (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi). This within-species analysis consistently identified a 140-bp matching sequence with high (93–100%) between-copy similarity in forcipulate sea stars plus Diplopteraster, with the following exceptions (see Tables A-5, A-6, and A-7 in Supplementary Material for details): Notasterias pedicellaris had a 1 bp insertion within intron 7; Astrometis sertulifera and Coscinasterias tenuispina each had a 3 bp deletion in intron 7; Pedicellaster magister had three insertions totaling 7 bp within intron 7, a 2 bp deletion within intron 7, and a 7 bp segment at the start of the alignment with little similarity between the two introns; Adelasterias papillosa had a 10-bp segment at the start of the alignment with little similarity between the two introns or to the other sequences; and Diplopteraster multipes had deletions totaling 4 bp in each intron, insertions totaling 7 bp in intron 5, and insertions totaling 4 bp in intron 7. For seven pairwise alignments, the repeat region extended 2 bp in the 5′ direction with 5′TT3′. For five pairwise alignments, the repeat extended 6 bp in the 3′ direction with 5′ACTGTT3′; for an additional six pairwise alignments, the repeat extended in the 3′ direction by the consensus sequence 5′DCYBTT3′. These 5′ and 3′ extensions to the 140-bp repeat are not included in subsequent analyses of the repeat sequence, but are included in the analysis of the flanking sequence. Alignment of intron 7 in Remaster gourdoni to a consensus of the 140-bp repeat from forcipulate intron 7 sequences showed that the repeat is present in intron 7 of this species but absent from intron 5. No copy of the 140-bp repeat, and no other significant sequence similarity between introns 5 and 7, could be detected in Bathybiaster loripes, Chitonaster johannae, Cladaster validus, Cuenotaster involutus, and Henricia sanguinolenta. Only intron 7 sequences were available for Chitonaster and Cladaster.

For Lethasterias nanimensis, no significant protein matches were found in BLASTp searches of all open reading frames (ORFs) >12 amino acids, and unrestricted BLASTn searches against GenBank or the S. purpuratus genome produced only short matches (<24 bp) with high E-values (>0.36). Additionally, the repeat sequence was absent from intron 6 of the ATP synthase β-subunit gene in forcipulates (unpublished data of Foltz) and absent from the elongation factor-1 α subunit gene (Ef-1α) introns in forcipulates (Wada et al. 2002; unpublished data of Foltz). However, a 125-bp portion of intron 5 of the ATP synthase β-subunit gene in the genus Asterias that is outside the repeat region did show significant (E-value = e−12) similarity to the complement of intron 1 of the Ef-1α gene in A. amurensis (Wada et al. 2002; details in Table A-2). The repeat thus appears to be a novel sequence whose functional significance, if any, does not involve coding for a protein.

Secondary-Structure Prediction

Analysis of secondary structure in full-length intron sequences was done by using RNAz v. 1.0 (see Table A-1 for details on sequences included in the analysis). This program combines minimum-free-energy calculations with analysis of sequence conservation to predict secondary structure using a machine-learning algorithm trained on a test set of noncoding RNAs (Washietl et al. 2005). Due to program limitations, only six sequences for each intron, chosen to approximate a star phylogeny, were included in the analysis. For intron 7, the prediction was structural RNA with 99% probability, whereas for intron 5, the prediction was nonstructural RNA with 23% probability. When the analysis for intron 5 was repeated with a 120-bp sliding window in 40-bp increments, only the window that was nested within the 140 bp repeat sequence returned a prediction of structural RNA, with a 96% probability.

Final secondary-structure predictions were done on separate alignments of the 140 bp repeat sequence for intron 5 (20 species) and intron 7 (24 species), using the program Alifold (Hofacker et al. 2002) on the Vienna RNA server (rna.tbi.univie.ac.at/cgi-bin/alifold.cgi). For each 140-bp repeat sequence, the same double-hairpin structure was predicted (Fig. 1). These predictions were done using RNA parameters and after removing the rare insertions noted above as per the recommendations of the program author. When DNA parameters were used, an identical double-hairpin structure was predicted for each repeat. Including the sequence 5′ACTGTT3′, which is present immediately downstream from position 140 of the repeat in all intron 7 sequences, did not change the predicted secondary structure of that repeat.

Fig. 1
figure 1

Secondary structure of the 140-bp repeat for intron 5 (A) and intron 7 (B) of the ATP synthase gene, β subunit. Numbers show position relative to the 5′ end of the repeat. Variable sites are underlined. Not shown are inferred insertions at positions 5, 17, 58 or 110, and inferred deletions of positions 5–7, 41, 47, or 131–132 (details are in Table A-7; because of indels, the numbering of nucleotide positions within the secondary structure does not correspond to the alignment shown in the Supplementary Material)

With one exception (a GG pair at positions 34 and 44 in intron 5 for Adelasterias papillosa, Diplasterias sp., and Notasterias pedicellaris: see Table A-3 in the Supplementary Material for details), all of the inferred differences that abolished an internal base pair in Fig. 1 were singleton (autapomorphic) differences, and many of these differences (6 of 12 for intron 5, and 15 of 17 for intron 7) involved the nonasteriid species Diplopteraster multipes and Pedicellaster magister. Two sites for intron 5 in Marthasterias glacialis were heterozygous, with only one of the nucleotides able to form an internal base pair, and the same was true for one site in intron 7 in Lysasterias perrieri. The largely autapomorphic nature of basepair-abolishing differences within the putative stem regions within the family Asteriidae, together with the putative heterozygosity in L. perrieri and M. glacialis, suggest that these differences are mostly either sequencing errors or mildly deleterious mutations that do not persist over long evolutionary time periods. Most of the differences that maintained the internal basepairs shown in Fig. 1 were TG ⇔ TA (three instances in intron 5 and five instances in intron 7) or TG ⇔ CG (six instances in intron 5 and two instances in intron 7). There were also four instances of possible compensatory substitutions (CG ⇔ TG ⇔ TA) in intron 7. Finally, there were multiple instances of an AT basepair within a stem region (for positions 34 and 44 of intron 5 in Sclerasterias eustyla and Stephanasterias albula, for the same positions of intron 7 in Astrometis sertulifera, Coscinasterias tenuispina, Marthasterias glacialis, and S. eustyla, for positions 100 and 115 of both introns in D. multipes, and for positions 19 and 59 of intron 5 in D. multipes) that was two mutational steps away from the other basepairs observed at this location in the secondary structure. There was also one CG base pair within a putative stem region (for positions 32 and 46 of intron 5 in D. multipes) that was two mutational steps away from the other basepair observed at this location. Because of the poor phylogenetic signal present in the two repeats and the possible occurrence of gene conversion (see below), no attempt was made to infer the mutational history of any internal basepair in either intron.

Evidence for Gene Conversion

Concerted evolution of nonadjacent intron repeat sequences has previously been reported for the tenascin-X gene in mammals (Hughes 1999), and may be a common feature of deuterostome noncoding repeat sequences (Jackson et al. 2005; Ezawa et al. 2006). To test this possibility, the program GENECONV (Sawyer 1999) was used to identify putatively converted tracts in within-species comparisons, using an alignment of repeat sequences (Table A-7 in the Supplementary Material) that included the short inferred insertions and deletions that had been omitted from the secondary-structure analysis. As used here, GENECONV looks for maximal segments of introns 5 and 7 within a particular species that are identical and unusually long, or that have high similarity based on a mismatch penalty scheme. Statistical significance is assessed by a Bonferroni-corrected permutation procedure. The program also looks for tracts that may have originated from outside the pairwise alignment; these are called ‘outer’ fragments in Table 1, as opposed to the ‘inner’ fragments based on comparison of the two introns within a species, following the notation of Sawyer (1999). This analysis identified two nearly overlapping tracts of putative gene conversion with an approximate length of 100 bp (see Table 1, and Table A-7 in the Supplementary Material) in Adelasterias papillosa and Stephanasterias albula. There is also a shorter (32 bp) tract of putative gene conversion in Pedicellaster magister. Because the complete genomic sequence of the ATP synthase β-subunit gene is not currently available for any sea star, it is also possible that there are other copies of the repeat present in other introns of this gene, or outside the gene, that could be the source of the conversion event. As noted above, Adelasterias papillosa had a 10-bp segment with little similarity between the two introns; this segment is responsible for the fact that the two repeat copies from this species do not cluster together in the phylogenetic analysis (see below and Fig. 2). Three short outer fragments were identified, all in intron 7: Adelasterias papillosa (8 bp), Marthasterias glacialis (9 bp), and Stephanasterias albula (9 bp). Excluding the three species with evidence for pairwise inner gene conversion events from Table 1, there was still a significant correlation between the two repeat copies in polymorphism versus monomorphism for individual nucleotide sites, when stem and loop regions were analyzed separately in 2 × 2 tests of independence (Table 2). When a more-stringent criterion for polymorphism was used (in which singleton sites were ignored on the grounds that they are more likely to be artifactual than are nonsingleton polymorphisms), the correlation in polymorphism level between the two introns remained significant only for the loop regions in Fig. 1.

Table 1 List of putatively significant (P < 0.05) gene conversion events for introns 5 and 7 of the ATP synthase gene in forcipulate sea stars
Fig. 2
figure 2

Maximum-likelihood tree from the analysis of the 140-bp repeat region obtained from intron 5 and intron 7 of the ATP synthase gene in sea stars. Each sequence is annotated in curly brackets as being from either intron 5 or intron 7. Numbers to the left of the basal polytomy show nonparametric bootstrap percentages from 200 pseudoreplicates. Nodes with less than 50% bootstrap support have been collapsed. The scale bar shows the expected number of substitutions. Superscript letters a to e denote pairs of sequences that were identical over the entire 140-bp repeat

Table 2 Tests for correlation in polymorphism levels between introns 5 and 7 for stem and loop regions of the ATP synthase gene in sea stars

Maximum-likelihood analysis in PAUP* of the 44 repeat sequences using the best-fit model of substitution (Table A-4 in Supplementary Material) resulted in a poorly resolved tree (Fig. 2), when nodes with less than 50% bootstrap support from 200 pseudoreplicates were collapsed. The apparent basal polytomy involving 26 of 44 sequences reflects the weak support for deep nodes in the Asteriidae + Pedicellasteridae clade. Most of the seven well-supported clades consist of repeats of the same intron from pairs of taxa known from previous work (Foltz et al. 2007a and unpublished data) to be closely related. Five pairs of identical sequences that lacked appreciable bootstrap support (see Fig. 2 for details) also involved pairs of closely related taxa. However, while four of those instances involved repeats from the same intron, the repeat in intron 5 of Leptasterias alaskensis was identical to the repeat in intron 7 of Evasterias retifera. Finally, for Urasterias lincki, there was only a 1 bp difference between the repeat from intron 5 and the repeat from intron 7 (see Table A-7 in Supplementary Material). This similarity was not detected as a putative case of gene conversion by GENECONV because conversion tracts are not allowed to occupy the entire length of the alignment (Sawyer 1999). These correlations suggest that concerted evolution of the two introns is more pervasive than is suggested by the gene conversion analysis. Possible explanations for the pattern of correlated polymorphism include small (<8 bp) gene conversion tracts, longer gene conversion tracts that have been obscured by subsequent homologous recombination or mutation, and correlation between the two introns in the pattern of selective constraints.

Analysis of Phylogenetic Signal and Rates of Evolution

A separate phylogenetic analysis was done in PAUP* to study the possible effects of gene conversion and other processes that affect sequence homology on phylogenetic signal in the repeat region. A maximum-likelihood analysis for each intron was performed, comparing the repeat region with the respective flanking sequence. This analysis took advantage of the existence of a prior phylogeny for 14 of the forcipulate species in Table A-1 (Foltz et al. 2007a), based on 4.2 kb of DNA sequence representing seven data partitions [nuclear 18S ribosomal RNA (rRNA) and 28S rRNA, mitochondrial 12S rRNA, 16S rRNA, five transfer RNAs (tRNAs), and cytochrome oxidase I with first and second codon positions analyzed separately from third codon positions]. Two species analyzed here (Leptasterias stolacantha and Evasterias retifera) were not included in the analysis of Foltz et al. (2007a) and have been placed in the a priori tree topology (Fig. A-2 in the Supplementary Material) as sister taxa to L. muelleri and the genus Leptasterias, respectively, based on unpublished data of Foltz. Parametric bootstrap replicates (N = 200) were then generated with the program Seq-Gen v. 1.3.1 (Rambaut and Grassly 1997) separately for each alignment, using the best-fit models obtained previously and ML parameter estimates obtained in PAUP* for the a priori tree (see Table A-4 in the Supplementary Material for details). The replicates were then analyzed in PAUP* via heuristic searches identical to the original ML analyses described above, and the likelihood scores exported for two trees – the ML tree and the a priori tree (Fig. A-2). The difference in log-likelihood values was calculated for each replicate in SAS v. 9.1.3 (SAS Institute 1989) and ranked. There was a significant (P < 5%) difference in topology between the ML tree (Fig. A-1 in the Supplementary Material) and the a priori tree for each repeat, and the difference for intron 7 was significant after Bonferroni correction (Table 3). There was no significant difference in topology between the ML trees and the a priori tree for the two flanking regions. The difference in phylogenetic signal between each repeat sequence and its flanking regions probably reflects the cumulative effect of past gene conversion events and ongoing selective constraints within each repeat sequence. Theoretical (e.g., Teshima and Innan 2004) and empirical (e.g., Figueroa et al. 2000; Jackson et al. 2005; Ezawa et al. 2006) studies both show that such processes can result in abrupt changes in phylogenetic signal across exon/intron boundaries and other transition points. The correspondence of the phylogeny derived from the flanking sequences for each intron to the a priori topology derived from mitochondrial and nuclear rDNA sequences (Foltz et al. 2007a) provides evidence against frequent amplification and sequencing of paralogous gene copies as an explanation for the lack of phylogenetic signal observed in the repeat regions.

Table 3 Test of goodness of fit of the maximum-likelihood (ML) tree reconstructed for each of four gene regions to the a priori topology from Foltz et al. (2007a), with statistical significance assessed by parametric bootstrapping

ML distances under the best-fit model for each alignment were exported from PAUP* and analyzed by linear regression in SAS, with significance levels assessed by Mantel tests in the program zt v. 1.0 (Bonnet and Van de Peer 2002). The results (Table 4) showed that the repeat-flanking region of intron 5 is evolving approximately 14% faster than the corresponding region of intron 7, that the repeat in intron 5 is evolving approximately 61% faster than the repeat in intron 7, and that each repeat is evolving 20–25% as fast as its respective flanking region. Within each intron, the stem regions (Fig. 1) are evolving 32–45% as fast as the respective loop regions, but only for intron 7 is the correlation in rates significant. There was no significant correlation in rate of evolution between the two repeats. Comparison to nuclear small-subunit ribosomal (18S rDNA) sequence distances recalculated from Foltz et al. (2007a) showed that the loop regions of each repeat are evolving about 44–52 times faster than the 18S gene and that the stem regions are evolving about 3.5–7 times faster (details not shown).

Table 4 Relative rates of sequence evolution for various intronic regions of the ATP synthase gene in forcipulate sea stars, based on pairwise maximum likelihood distances estimated using separate best-fit models for all regions

Origin of the Repeat

Most of the species included in the analysis show a flanking dinucleotide direct repeat (5′TT3′) located immediately upstream of the 140-bp repeat region and two to five nucleotides downstream from the repeat for one or both introns (see Tables A-5 and A-6 in Supplementary Material for details). The downstream member of this short direct repeat is usually incorporated into the 5′RCYBTT3′ motif that was mentioned above. Although this 2-bp direct repeat appears to be too short by itself to be the result of a mobile element insertion, it is possible that the two copies of the repeat are derived from ancient insertion events, with the predicted flanking repeats obscured by subsequent mutations. The length of the repeat is at the lower end of the size range of short interspersed repetitive elements (SINEs, see Ohshima and Okada 2005), and the repeat sequences in Table A-7 show some similarity to conserved motifs in tRNA-derived SINEs, such as CCA in positions 96–98, CTG in positions 110–112, and TG in positions 124–125 [compare Fig. 3 in Okada and Ohshima (1993)]. However, there are no motifs in Table A-7 that are similar to the consensus RNA polymerase III promoter sequences inferred from SINEs in vertebrates and mollusks (Borodulina and Kramerov 1999), and it is unclear whether the repeat is a conventional SINE element, a truncated SINE, or some other type of repetitive sequence. As yet, there are no published reports of SINEs in sea stars (Ohshima and Okada 2005), although a repeat element has been described from the sea urchin Strongylocentrotus purpuratus genome (Nisson et al. 1988). Currently, the only estimates of genome-wide repetitive DNA abundance in sea stars are from kinetic hybridization studies (Fielman and Marsh 2005), which show a percentage of repetitive DNA roughly comparable to sea urchins and other metazoans. Whether the repeat sequence in Table A-7 occurs elsewhere in the forcipulate genome will require a broader-scale genomic survey that is beyond the scope of the present study. However, given the small amount of nuclear genomic sequence currently available for sea stars, it is noteworthy that an apparent intergenic insertion or conversion event involving introns has occurred between the elongation factor-1 α-subunit gene and the ATP synthase β-subunit gene in the genus Asterias (Table A-2). A wider survey may well uncover other examples of apparent reticulate evolution in sea star genomes.

The phylogenetic analysis of Foltz et al. (2007a) recovered a monophyletic Asteriidae (sensu Clark and Downey 1992) + Rathbunaster + Pycnopodia with 100% bootstrap support. Pedicellaster was present in a poorly resolved (bootstrap values <70%) polytomy at the base of the forcipulate clade that also included the Zoroasteridae, Heliasteridae, Brisingidae, some Labidiasteridae, and the velatid Pteraster. Dating the origin of the repeat sequence is problematic, in part because the fossil history of sea stars is sparse. This situation is due mainly to the lack of a rigid skeleton and the large body cavity in sea stars, which usually lead to poor preservation (Blake and Zinsmeister 1988; Blake 1990). There are two fossil dates which are relevant to the time of origin of the repeat sequence. A fossil pedicellasterid (Afraster scalariformes) was described from the Upper Cretaceous and assigned to the Pedicellasteridae (Blake et al. 1996). Two fossil asteriids (Germanasterias amplipapularia and Hystrixasterias hettangiurnus) were described by Blake (1990) from the Lower Jurassic and assigned to the Asteriidae (but some zoroasterid characteristics were noted). Assuming a Lower Jurassic age for the mostly asteriid clade recovered by Foltz et al. (2007a) would put the earliest known time of origin of the repeat sequence at 200 Mya. If the two Jurassic fossils are assigned to some other forcipulate lineage, then the pedicellasterid fossil would put the earliest known time of origin at around 84 Mya. The presence of the repeat in both introns 5 and 7 of the pterasterid Diplopteraster multipes might suggest a time of origin that is earlier than 200 Mya, except that the phylogeny of Foltz et al. (2007a) included the pterasterid Pteraster tesselatus within the Forcipulatida. Searching for the repeat in other sea star taxa is hindered by the fact that present data (e.g., Knott and Wray 2000; Janies 2001; Matsubara et al. 2004; Foltz et al. 2007a) are insufficient to identify the likely sister group to the forcipulate sea stars. The absence of the repeat from the corresponding introns of a paxillosid, two valvatids (data for intron 7 only), a spinulosid, and a velatid sea star indicates that this structure is not a ubiquitous feature of these introns in sea stars. Assuming that gain of the repeat is rare and irreversible (Nikaido et al. 1999; but see Cantrell et al. 2001), the most probable scenario is that the repeat inserted into intron 7 in the lineage leading to Korethrasteridae + Pterasteridae + Forcipulatida, following by insertion into intron 5 in the Pterasteridae + Forcipulatida lineage. The presence of the repeat in two velatid species (families Korethrasteridae and Pterasteridae) and absence of the repeat from a third velatid species (family Solasteridae) is not necessarily surprising from a phylogenetic standpoint, as several molecular phylogenetic studies (Knott and Wray 2000; Janies 2001; Matsubara et al. 2004) have reconstructed a polyphyletic relationship between the Pterasteridae and the Solasteridae. The present analysis could be strengthened, and the assumptions tested, by further sampling of sea star taxa for introns 5 and 7, and by surveying additional introns from ATP synthase β-subunit and other genes for the presence of the repeat.