Introduction

Siboglinid annelids occur throughout the world’s oceans but are best known from hydrothermal vents, cold seeps, and whale bones (Schulze and Halanych 2003; Rouse et al. 2004; Southward et al. 2005). Chemotrophic endosymbiotic bacteria enable these worms to thrive in these extreme environments (Cavanaugh and Gardiner 1981; Southward and Southward 1981; Halanych 2005; Goffredi et al. 2005; Thornhill et al. 2008). Siboglinidae is comprised of four lineages: frenulates, vestimentiferans, monoliferans, and Osedax (Hilário et al. 2011). Frenulates, comprising the majority of known siboglinid species, are often thread-like and found within sediments of reducing habitats (Southward 1978; Southward et al. 2005; Thornhill et al. 2008; Hilário et al. 2010). Vestimentiferans, on the other hand, are large tubeworms that are typically found in hydrothermal vents and cool seeps (McMullin et al. 2003). Monilifera is represented by a single genus (i.e., Sclerolinum) that shares similarities to frenulates in terms of size and preferred habitat, but can also be found on decaying organic material (Halanych et al. 2001). Finally, Osedax, first described in 2004, are worms that colonize whale bones (Rouse et al. 2004; Glover et al. 2005).

Adult siboglinids lack a functional gut and instead rely on chemosynthetic endosymbionts to supply some or all of their energetic needs (Cavanaugh and Gardiner 1981; Hilário et al. 2011). In this context, hydrogen sulfide (H2S) is absorbed and transported via the blood vascular system to symbiotic bacteria within a specialized organ called the trophosome (Southward 1988; Goffredi et al. 2005; Katz et al. 2011; Bright et al. 2012). Most siboglinid endosymbionts are chemoautotrophic and generally belong to the gammaproteobacteria (Thornhill et al. 2008; Verna et al. 2010). Osedax, whose morphology is more arborescent in appearance, harbor heterotrophic endosymbionts (Oceanospirillales, Gammaproteobacteria) in a root-like system that extends into the whale bone matrix (Goffredi et al. 2005) where endosymbionts utilize the complex compounds released from the bones (Rouse et al. 2004). Approximately 31 lineages of Osedax have been discovered (Smith et al. 2015) and phylogenetic analyses based on ribosomal genes and mitochondrial cytochrome oxidase I usually place Osedax as sister to a moniliferan–vestimentiferan clade (Rouse et al. 2004; Glover et al. 2005), but Glover et al. (2013) and Rouse et al. (2015) suggest a position sister to frenulate siboglinids. Despite this suggestion, recent analyses of whole mitochondrial genome data strongly favor allying Osedax with vestimentiferans and monoliferans (Li et al. 2015; Fig. 1).

Fig. 1
figure 1

Current hypothesized phylogeny of Siboglinidae based on Li et al. (2015, Fig. 3a). Majority rule (50 %) consensus topology of a Bayesian analysis of mitochondrial genome data is shown. Values are shown next to nodes with posterior probabilities left and ML bootstrap support values right. Filled circles indicate fully supported nodes (bs = 100, pp = 1.00). Additional analytical details found in Li et al. (2015)

For some chemoautotroph-bearing siboglinids, H2S uptake and transport is mediated by specialized hemoglobins (Hbs) (Numoto et al. 2005; Meunier et al. 2010). Reversible binding of H2S to Hbs has been best studied in the vestimentiferans Riftia pachyptila and Lamellibrachia luymesi, as well as the frenulate Oligobrachia mashikoi (e.g., Suzuki et al. 1990; Yuasa et al. 1996; Zal et al. 1996a, b, 1997). Hbs are complex structures with individual globin chains assembling into hetero-dimer subunits. Those subunits, in turn, assemble into a tetrameric functional protein, with each heme directly interacting with adjoining subunits whose size varies (Numoto et al. 2008). Vestimentiferans have one large extracellular Hb (V1 ~ 3500 kDa) and one small extracellular Hb (V2: ~400 kDa) in their vascular blood. Additionally, they possess one Hb (C1) in coelomic fluid that is reported to be 400 kDa (Arp and Childress 1981; Zal et al. 1996a). Whereas V1 contains 4 heme-containing globin chains (b–e) and 4 linker chains (L1–L4), V2 is composed of 6 globin chains (a–f), and C1 contains 5 globin chains (a–e). In contrast, the frenulate O. mashikoi possesses a single ~400 kDA Hb composed of 24 globin chains with no linkers, comparable to the small extracellular Hbs of vestimentiferans (Yuasa et al. 1996; Numoto et al. 2005). Binding of H2S has been hypothesized to be mediated, in part, by cysteine residues in the V1 chains and by disulfide bridges formed from cysteine-rich linker chains (R. pachyptila’s V1 chain b—B2 and L. luymesi’s V1 chain AIII—A2; Zal et al. 1996b, 1997). However, this only accounts for part of the binding affinity, and zinc moieties bound to amino acid residues at the interface between pairs of A2 chains may also be involved (Flores et al. 2005). With reference to R. pachyptila’s A2 chain, cysteines at positions 4 and 134 are common to all annelid globin chains studied and form a disulfide bridge while a free cysteine at position 75 is unique to sulfur oxidizing siboglinids (Zal et al. 1997).

Given our understanding of siboglinid phylogeny (Li et al. 2015), the bone-eating Osedax has likely evolved from ancestors dependent upon chemoautotrophic bacteria (Schulze and Halanych 2003; Hilário et al. 2011) at least 100 million years ago (based on fossil and molecular data; Danise and Higgs 2015). Due to its heterotrophic symbiosis, Osedax is apparently no longer dependent on H2S transport or the modified blood physiology to nourish endosymbionts (Rouse et al. 2004; Goffredi et al. 2005). We assume the ability to bind H2S carries a cost to the organism, as most Hbs lack such affinity and may be selected against in sulfide-free habitats (Bailly et al. 2003). Based on this, we hypothesized that the Osedax Hb system would exhibit differences relative to other siboglinids; specifically, amino acid substitutions for carrying H2S should be lacking in Osedax. To avoid a PCR-based approach that would require multiple primers and attempts to isolate single genes, and because Hbs are ubiquitously expressed in the blood vascular system of siboglinids, we employed high-throughput DNA sequencing to generate transcriptomic data. This methodology allowed examination of amino acid sequences of Hbs and linker proteins from O. mucofloris, three frenulates, a moniliferan, and three vestimentiferans, in addition to publically available data. Specific targets were the level of conservation among Cyt residues (especially at positions 4, 75, and 134) in Hb chains across siboglinids as well as conceptually examining how amino acid differences may influence protein-folding characteristics.

Materials and Methods

Siboglinid Sampling

Siboglinid samples were procured for transcriptome sequencing from a variety of sources (Table 1). Specifically, Christoffer Schander kindly provided O. mucofloris from whale bones near Bergen, Norway, and Sclerolinum contortum from the Håkons-Mosby mud volcano off Norway. Samples of Lamellibrachia luymesi, Escarpia spicata, Seepiophila jonesi, and Galathealinum brachiosum were collected in the Gulf of Mexico using the Johnson Sea Link submersible aboard the R/V Seward Johnson. Samples of Siboglinum fiordicum were obtained using a small hand grab on the R/V Aurelia (University of Bergen) and Siboglinum ekmani were obtained by dredge on the R/V Håkons-Mosby from near Bergen, Norway. At the time of collection, all samples were morphologically identified and stored in RNALater.

Table 1 Siboglinid sample collection information

Extraction and Sequencing

RNA extraction and cDNA preparation for high-throughput sequencing followed Kocot et al. (2011) and Li et al. (2015). Briefly, RNA was extracted using a TRIzol (Invitrogen) protocol, and then purified with the RNeasy kit (Qiagen) using an on-column digestion. Next, single-strand cDNA libraries were reverse transcribed using the SMART cDNA Library Construction kit (Clontech) followed by double-stranded cDNA synthesis using the Advantage 2 PCR system (Clontech). The double-stranded cDNA from O. mucofloris was sequenced on an Illumina MiSeq sequencer at Auburn University using a Nextera (Illumina) protocol, as well as an Illumina HiSeq 2000 sequencer at the Genomics Services Laboratory at the Hudson Alpha Institute for Biotechnology (Huntsville, AL, USA) using the TruSeq v3 (Illumina) protocol. cDNA for Escarpia spicata, G. brachiosum, L. luymesi, and S. jonesi were sent to the University of South Carolina Environmental Genomics Core Facility (Columbia, SC, USA) for Roche 454 GS-FLX sequencing. Additionally, cDNAs for L. luymesi, S. contortum, S. ekmani, and S. fiordicum were sequenced on an Illumina HiSeq 2000 sequencer at Hudson Alpha Institute for Biotechnology.

Sequence Assembly

Sequencing reads were digitally normalized using the normalize-by-median script in the khmer package (https://github.com/ctb/khmer/blob/master/scripts/normalize-by-median.py) to facilitate assembly and decrease the likelihood that overrepresentation of reads would cause assembly artifacts (McDonald and Brown 2013). Transcriptome assemblies from MiSeq and 454 data were done de novo with the October 2012 release of Trinity (Grabherr et al. 2011), while HiSeq 2000 data were assembled with the February 2013 release of the same software. For O. mucofloris and L. luymesi, cDNA was run on two different platforms. In these cases, data were assembled separately and each searched for genes of interest.

BLAST and Sequence Alignment

Hb and linker sequences of interest were obtained from assembled transcriptomes via BLAST (Altschul et al. 1990) by utilizing Hb and linker sequences acquired from GenBank of siboglinids as well as outgroup organisms as queries (Table 2). Specifically, an e value cutoff of 10−5 was utilized in tblastn searches of nucleotide assemblies with the query protein sequences. Arenicola marina, a sulfur-tolerant polychaete, was used as outgroup based on the availability of these sequences. Resulting BLAST hits were filtered using blast2table.pl (available from http://www.genome.ou.edu/informatics.html) with the “top” option, which reports only the best, high-scoring segment pair for each query sequence. Linker sequence hits were manually evaluated based on e value and percent identity to determine similarity. The resulting Hb hits were translated using ESTScan version 3.0.3 (Iseli et al. 1999) and sequences aligned using MUSCLE (Edgar 2004) within MEGA 5.2 (Tamura et al. 2011). The alignment was visually inspected and spuriously aligned data removed based on sequence similarity to the alignment as a whole.

Table 2 GenBank accession numbers for hemoglobin and linker proteins

Gene Tree and 3D Structure Prediction of Data

Following alignments, we focused on the A2 Hb because enough sequences were recovered for Osedax and other siboglinids to allow meaningful comparisons. A2 Hb sequences were manually trimmed of missing leading, and trailing positions and Gblocks version 0.91b (Castresana 2000; Talavera and Castresana 2007) was used to trim poorly aligned positions and divergent regions with the following parameters: minimum number of sequences for a conserved position = 7, minimum number of sequences for a flank position = 7, maximum number of contiguous non-conserved positions = 8, minimum length of a block = 2, and gap positions allowed in all blocks. An appropriate amino acid substitution model for phylogenetic reconstruction was selected using Prottest version 3.4 (Darriba et al. 2011). RAxML version 7.3.8 (Stamatakis 2014) was used to infer a maximum-likelihood gene tree with 100 bootstrap replicates using the PROTGAMMAWAG model, with A. marina serving as the outgroup. Osedax mucofloris, Lamellibrachia luymesi, Siboglinum ekmani, Arenicola marina, and Sabella spallanzanii Hb chain A2 structures were predicted as 3D models using the I-TASSER structure prediction server (Yang et al. 2015).

Results

Sequencing Results

High-throughput DNA sequencing produced 283,594–750,876 reads for 454, 3,027,776 reads for MiSeq, and 21,397,136–56,067,578 reads for HiSeq 2000 (Table 1). Contigs per assemblies were 7209–12,080 for 454 data, and 17,617–270,658 for MiSeq and HiSeq 2000 data (Table 1).

BLAST Results

Across the eight transcriptomes, tblastn searches returned 12 top hits (e value cutoff of 10−5) for chain A1, 17 for chain A2, 22 for chain B1, and 12 for chain B2. Upon closer inspection, the singular hit to Osedax mucofloris for chain B2 was a contig that also was returned in searches for chain A2 homologs, and the B2 hit was discarded based on the higher strength match to the A2 hit. These top hits were combined with data acquired from NCBI’s GenBank (Table 2) to generate alignments for each of the four Hb chains. After manual removal of redundant and incorrect sequences, a single contig for each chain was retained per taxon. However, after inspection of the alignment, A1 sequences were not recovered for Escarpia spicata and Galathealinum brachiosum. Additionally, the B2 sequence of Siboglinum ekmani had a single stop codon within the protein-coding region. This sequence was further verified via read mapping with Bowtie 2 (Langmead et al. 2009). Furthermore, the sequence aligned well, but since it was not full length, it was not further considered. All contigs recovered contained complete genes except for E. spicata A2 and B2, Seepiophila jonesi B2, S. ekmani A1, B1, and B2, and all Siboglinum fiordicum contigs.

As linkers aid formation of Hb hexagonal bilayer structure, we also examined Osedax linkers to determine if they are similar to those from vestimentiferans. The tblastn searches for linker sequences resulted in multiple hits for each species. The 454 assemblies of E. spicata, G. brachiosum, S. jonesi, and L. luymesi had relatively few hits at 5, 6, 9, and 18 hits, respectively. Illumina assemblies had higher numbers of hits, with 23 for O. mucofloris, 44 for S. ekmani, 47 for S. fiordicum, 75 for L. luymesi, and 118 for S. brattstromi. Upon manual inspection of each taxon’s BLAST scores, all 8 transcriptomes were found to have an on-average higher score, e value, and percent identity for hits to vestimentiferan linkers than to non-siboglinid linkers (Table 3). Linker sequences showed considerable variation limiting alignment and the ability to produce a meaningful gene tree.

Table 3 Averages of the BLASTX results of linker sequences from vestimentiferans and non-siboglinids to eight transcriptomes generated in this study

Cysteine Presence/Absence

For chains A1 and B1, no free cysteine occurred at conserved amino acid positions for any taxon. For chain A2, a conserved-free cysteine at position 75, correlating to that found by Zal et al. (1997), was present in all taxa except G. brachiosum (Fig. 2). This species lacked a free cysteine between the two cysteines involved in the formation of disulfide bridges. For chain B2, one incorrect BLAST hit was recovered for O. mucofloris (i.e., an A2 hit returned for the B2 search); however, a conserved-free cysteine was found for all other taxa excluding E. spicata, G. brachiosum, and A. marina.

Fig. 2
figure 2

Amino acid alignment of chain A2 for siboglinids. Alignment was generated in MEGA 5.2 using MUSCLE and visualized using UniPro UGENE (Okonechnikov et al. 2012). Bars at the top of the alignment show percentage of conserved identical amino acid for that position. Conserved cysteines at positions 23, 94, and 153 shown in purple (Color figure online)

Gene tree and 3D Structure Prediction

Final alignment of the 12 A2 chain sequences had 116 amino acid positions. Maximum-likelihood analysis of this alignment placed the O. mucofloris A2 sequence between the A2 sequences of frenulates and a moniliferan/vestimentiferan clade; however, frenulate sequences were recovered as paraphyletic with weak support (Fig. 3). The O. mucofloris chain A2 sequence was recovered as sister to the monilferan/vestimentiferan chain A2 clade with moderate support (bootstrap = 73).

Fig. 3
figure 3

Hemoglobin chain A2 gene maximum-likelihood tree reconstructed with RAxML using the PROTGAMMAWAG model. The optimal topology had a—ln Likelihood of—1995.806816. Bootstrap support values >50 % are shown at the relevant node

Reconstruction of 3D models resulted in similar predictions for the three siboglinid species examined (Figs. 4, 5). Specifically, I-TASSER predicted a heme ligand binding site for each siboglinid with high confidence (0.99–1.00 C score), and no other ligand binding factor produced a C score >0.03. Ligand binding site residues predicted by I-TASSER (Supplementary Table 1) were identical between L. luymesi and O. mucofloris, while S. ekmani had only one difference and a single codon insertion at the 5′ end of the sequence before the binding pocket region. The predicted A. marina structure was identical to S. ekmani with the exception of an additional binding site residue located seven amino acid positions before the first position in S. ekmani’s binding site. S. spallanzanii started in the same relative position as A. marina, but had differences in predicted ligand binding site residues from all other taxa examined. The 3′ residues were more conserved across all five taxa compared to 5′ residues.

Fig. 4
figure 4

3D structure prediction of (a) Osedax mucofloris, (b) Lamellibrachia luymesi, (c) Siboglinum ekmani, (d) Arenicola marina, and (e) Sabella spallanzanii A2 chain proteins using the I-TASSER protein structure prediction server (Yang et al. 2015). Predicted heme binding site shown in green (Color figure online)

Fig. 5
figure 5

Stereoscopic overlay of the 3D structure predictions of Osedax mucofloris (red), Lamellibrachia luymesi (yellow), Siboglinum ekmani (green), Arenicola marina (orange), and Sabella spallanzanii (purple) A2 chain proteins using the SuperPose webserver version 1.0 (Maiti et al. 2004) (Color figure online)

Discussion

Contrary to our hypothesis, analyses presented here suggest Osedax has the biochemical capability of producing sulfur-binding Hbs. Specifically, Osedax mucofloris possesses a free cysteine at position 76 of the chain A2 of its Hbs. Additionally, the predicted 3D structure of this chain (Fig. 4) is nearly identical among siboglinids and the sulfur-tolerant A. marina, implying identical function. Involvement of Hbs in sulfide detoxification as part of Osedax life history at whale fall habitats may account for selection and retention of residues involved in hydrogen sulfide binding. Assuming that free cysteines in Hbs are subject to negative selection in polychaetes from sulfide-free habitats (Bailly et al. 2003), the presence of free cysteines in Hbs in Osedax is consistent with the idea that Osedax not only copes with hydrogen sulfide, but may use Hbs to interact with hydrogen sulfide in biologically important ways (e.g., Hbs have higher binding affinity than cytochrome-c oxidase, which is inhibited by small amounts of hydrogen sulfide; National Research Council 1979). The ability to bind sulfur for detoxification could even be under positive selection (Eichinger et al. 2014).

O. mucofloris possesses Hb linkers with greater similarity to vestimentiferan siboglinids than to sulfide-tolerant polychaetes; a result consistent with a recent phylogeny for the group (Li et al. 2015). This could indicate that Osedax produces hexagonal bilayer Hbs capable of sulfur binding. In the context of siboglinid phylogeny (Fig. 1), the presence of Hb linkers could indicate that the last common ancestor of vestimentiferan/moniliferan and Osedax possessed Hb that bound sulfur as well as oxygen. However, comparisons between reference sequences of vestimentiferan linkers and our novel transcriptomes recovered frenulate and O. mucofloris hits with similar blast scores (Table 3). Currently, only vestimentiferan and moniliferan siboglinids have been shown to possess the hexagonal bilayer Hbs that self-assemble with linkers. As other annelids have large hexagonal bilayered Hbs, frenulates, possessing ring-shaped Hbs, seem to have lost the ability to produce linkers capable of creating more complex structures. Both ring and hexagonal bilayer Hbs use the same types of globins (Meunier et al. 2010), and similarities across these globin types likely confound the analyses of linker sequences presented here. Quantification of the molecular mass of Osedax Hb would help determine whether Osedax Hbs are a hexagonal bilayer or a ring structure in nature.

Here, we analyzed O. mucofloris Hb as a first step toward determining how these proteins might function in the biology of these siboglinids bearing heterotrophic endosymbionts. The hemoglobin complex shows variation in size and complexity across siboglinid lineages. However, residues of the A2 heme ligand binding site have apparently remained nearly identical over 60 MY. Thus, aspects of the annelid hemoglobin mechanism have evolved at different rates, presumably due to variation in selective pressures. Such pressures may be tied to endosymbiont biology or the need to detoxify H2S in different host environments. Unlike most siboglinids, Osedax should not require sulfur-binding Hb to support its endosymbionts. Yet sulfur-binding Hb has apparently persisted in this group of bone-eating worms. Osedax experience high levels of hydrogen sulfide during their life. They possess a high surface area to volume ratio in its root system, similar to the less branched root of Lamellibrachia where hydrogen sulfide uptake occurs (Julian et al. 1999; Huusgaard et al. 2012). Although the root epidermis of Osedax was suggested as an important site for nutrient uptake (Katz et al. 2010), how the mucus sheath that envelops the trunk and root structures of O. mucofloris (Higgs et al. 2011) effects chemical uptake from bones, including hydrogen sulfide, is unclear. Moreover, the exterior surface of whale bones experiences microbial sulfide production, with the potential for bone interiors to have reducing microbial activity due to degradation of hydrophobic lipids, a process that can be facilitated by Osedax (Treude et al. 2009). The presence of hydrogen sulfide within bones is further supported by observations of iron sulfide staining and white filamentous bacterial mats around Osedax boreholes (Higgs et al. 2011). These factors would indicate that Osedax roots are in an environment with relatively high hydrogen sulfide levels, where the ability to detoxify it may be biologically advantageous.