Introduction

Hydrothermal vents are located along oceanic ridges or active convergent margins on the ocean floor. These areas are characterized by harsh and challenging conditions for metazoans because of the presence of heavy metals and sulfide (both toxic compounds), low availability of oxygen (hypoxia), high temperatures, and low pH (Childress and Fisher 1992; Tunnicliffe 1991). Despite such harsh conditions, hydrothermal vent communities are characterized by both a high abundance of specialized fauna (mostly endemic) and low species richness. This low and specialized biodiversity mainly results from the strong selective constraints that act as a filter to species not adapted to cope with these conditions. The adaptive peculiarities developed by hydrothermal species can be observed at several levels: trophic ability, organ morphology, enzyme activity, respiratory pigment affinity, and ATP synthesis (Childress and Fisher 1992). In particular, response to hypoxia is possibly the most basic challenge that metazoans must overcome to thrive and reap the benefits of the local primary production (Hourdez and Lallier 2007).

As an example, respiratory adaptations found in hydrothermal vent species can affect different organizational levels. They can affect the animal behavior (avoidance of some areas, variations in ventilation), the morphology (increased gill surface areas, reduced diffusion distances), the biochemistry (metabolism, presence of respiratory pigments), and the molecule itself (properties of the respiratory pigments) (for a review, see Hourdez and Lallier 2007). In particular, respiratory pigments usually exhibit high oxygen affinities when compared to littoral species that live in well-oxygenated environments (Hourdez and Weber 2005; Hourdez and Lallier 2007). In some annelids, extracellular hemoglobins that circulate at high concentrations represent a significant form of oxygen storage. In addition, their high oxygen affinity allows oxygen uptake from the environment even when its partial pressure is low. Finally, some hemoglobins have the capacity to reversibly bind both O2 and sulfide, an ability that is essential for the functioning of the symbiosis in the vestimentiferan tubeworm Riftia pachyptila (Arp and Childress 1983; Childress and Fisher 1992; Weber and Vinogradov 2001).

The Polynoidae scale-worms are very diverse in the hydrothermal ecosystem, representing ~ 10% of all invertebrate species (Tunnicliffe 1991). Different species occupy all the available hydrothermal habitats where metazoa are found, ranging from the coldest areas (~ 2 °C) to the warmest—and most hypoxic—areas near venting fluids (~ 40 °C). Before the discovery of hydrothermal vent species, scale-worms (annelids that include Polynoidae) were thought to only possess intracellular globins, in the muscles (myoglobin) and particularly in the nerve cord (neuroglobin) (Weber 1978; Dewilde et al. 1996). Interestingly, all hydrothermal polynoid species possess red-colored coelomic fluid, due to the presence of extracellular hemoglobins (Hourdez et al. 1999a; Hourdez unpub. data). In the genus Branchipolynoe, two basic types of extracellular hemoglobins exist, a single-domain and a tetra-domain globin. This latter type was shown likely to be the result of evolutionary tinkering based on the tandem duplication of an ancestral single-domain intracellular globin (Projecto-Garcia et al. 2010). Although tetra-domain hemoglobins are so far only restricted to the genera Branchipolynoe (Hourdez et al. 1999a) and Branchinotogluma (Hourdez, unpub. data), all the other endemic vent polynoids possess at least single-domain extracellular hemoglobins on which we focused our attention for the present study of their adaptive evolution.

Hypoxic vent environments led to functional innovations in respiratory pigments essential for the survival of species (Bailly et al. 2002, 2003; Projecto-Garcia et al. 2010). Detection of adaptive molecular signatures and of the action of positive selection at the amino acid level can be performed by looking at the variations of the non-synonymous/synonymous substitution rate ratio (ω = d N/d S) either between closely related evolutionary lineages or between codon sites along the coding sequence of a given gene (Yang 1998; Yang and Nielsen 2002). Using this phylogenetic tool, we investigated the possible adaptive role of some amino acid changes during the evolution of the single-domain extracellular globin in hydrothermal vent scale-worms from a wide range of contrasted conditions and life-styles (and thus different selective constraints), including hydrothermal vent, shallow-water, and non-vent abyssal polynoid species. We were especially interested in testing different lineages, between different ecological groups, for signatures of selection that could be relevant to hemoglobin (Hb) evolution in these contrasted environments: (i) shallow water vs. deep-sea; (ii) deep-sea vs. hydrothermal vents; (iii) hydrothermal vents vs. acquisition of gills and multi-domain Hb, and, finally, within this last group (iv) commensal vs. free-living species.

Materials and Methods

Animal Collection

The collected species, sampling area, and habitat are detailed in Fig. 1 and Table 1. All the deep-sea specimens were identified on board the research vessel, immediately frozen, and stored at − 80 °C until used in the laboratory. The species were chosen to represent various microhabitats at hydrothermal vents, from the coldest with the least hydrothermal influence, to the warmest on the chimney walls (closest to the vent fluid), with temperatures reaching 40 °C near the animals. The pure hydrothermal fluid is anoxic, and its mixing in variable proportions will not only affect temperature but also oxygen contents: the warmer the area, the lower the oxygen concentration. Branchinotogluma segonzaci is a representative of the warmest habitat, on the chimney wall (20–40 °C). B. trifurcus and Branchiplicatus cupreus are usually found in colder areas (10–20 °C for the former and 2–10 °C for the latter), farther away from the source of the fluid. A still undescribed species of Branchinotogluma sp. inhabits the periphery of the vents, in water at a stable 2–3 °C. Branchipolynoe seepensis and B. symmytilida live in the mantle cavity of mussels symbiotic with thioautotrophic bacteria (obligatory commensalism: Van Dover et al. 1999; Jollivet et al. 2000), with temperatures usually ranging between 4 and 10 °C. Besides all these species with gills, Lepidonotopodium williamsae represents a free-living, non-branchiate endemic hydrothermal species, collected among mussels, and experiences temperatures in the same range as Branchipolynoe spp., and possibly slightly higher. In addition to these vent-endemic species, a deep-sea species of the subfamily Eulagiscinae was captured on bare rocks near hydrothermal vents but was not exposed to any vent influence (stable temperature, around 2–3 °C). Harmothoe extenuata is a temperate, shallow-water species and was collected on the rocky shore in Roscoff, France. Sthenelais boa (Sigalionidae), a littoral scale-worm species closely related to polynoids (Norlinder et al. 2012), was used as an outgroup. These three latter species do not possess extracellular single- or multi-domain hemoglobins but have an intracellular globin in their nervous system (neuroglobin) (Weber 1978; Hourdez personal observation).

Fig. 1
figure 1

World map showing the locations of sampled species. Lau basin: ABE (20°46′S, 176°11′W) 2150 m depth, Tow Cam (TC, 20°06’S, 176°34′W) 2700 m depth, Kilo Moana (KM, 20°03’S, 176°08′W) 2600 m depth, Tu’i Malila (TM, 21º59′S, 176º34′W) 1900 m depth; East Pacific Rise: 9°50’N area (9º46′N, 104º21′W) 2500 m depth, 11ºN area (11º25′N, 103º47′W) 2500 m depth; Mid-Atlantic Ridge: Lucky Strike (LS, 37°18′N, 32°16′W) 1700 m depth; Roscoff, France, 4–6 m depth. Map obtained and edited through Ocean View Data 4 (Schlitzer 2015)

Table 1 Sampling areas and habitat of the different Polynoidae species (in alphabetical order)

Nucleic Acid Extraction and cDNA Synthesis

A standard phenol/chloroform protocol following proteinase K digestion (Sambrook et al. 1989) was used to extract genomic DNA (gDNA) from Branchipolynoe symmytilida, B. seepensis, Branchiplicatus cupreus, and Lepidonotopodium williamsae. For B. segonzaci, B. trifurcus, and the Eulagiscinae, gDNA was isolated following a CTAB + PVPP extraction protocol (Doyle and Doyle 1987). For all species, total RNA was extracted from the anterior part of the worm’s body using TRI Reagent® (Sigma) and following the manufacturer’s protocol, and cDNA was then synthesized by reverse transcription using MMLV-Reverse Transcriptase with an oligo(dT)18 or an anchored oligo(dT) primer (see Table S1 and S2).

cDNA and Gene Sequencing

Sequences were obtained following two different strategies: amplification by PCR on genomic or cDNA, and search in assembled transcriptomes obtained by assembly of Illumina HiSeq data.

For PCR amplification, degenerate primers were designed based on previous globin sequences from the Polynoidae Branchipolynoe symmytilida and B. seepensis, as well as neuroglobin from the Aphroditidae Aphrodita aculeata. The PCR conditions and the type of template (cDNA or gDNA) differed according to the species used for amplification (see Table S1). The PCR products were visualized on a 1.5% agarose gel containing ethidium bromide under UV light, and cloned with the TOPO TA Cloning kit (Invitrogen). The positive clones were sequenced, and the sequences were used to produce specific primers for all the species (Table S1 and S2). Directional chromosome walking on gDNA (see Projecto-Garcia et al. 2010 for details) was used to sequence the missing parts of the coding sequences, the 5′ UTR, and the promoter region of the globin genes for some species. When the sequences were obtained in several fragments, sufficient overlap regions were used to assemble the various fragments into a full-length sequence.

For the two non-vent species (the deep-sea Eulagiscinae and the shallow-water species H. extenuata), the intracellular globin sequence was retrieved from RNA-Seq data (unpub. data). Briefly, total RNA was extracted as described above, checked for quality, and sent for sequencing. The sequencing was performed at the McGill University platform with the Illumina HiSeq 2000 technology. One lane per species was used and provided 80 million, paired-end, 108-base-long sequences. For each species, the fragments were assembled with Velvet/Oases, using a Kmer length of 51. The globin sequences were recovered by tblastx on the assembled sequences using a vent species globin sequence as the query.

Protein Sequence and Phylogenetic Analyses

The nucleotide sequences obtained by Sanger sequencing were assembled, checked, and edited based on their chromatograms with CodonCode Aligner 2.0.6 (http://www.codoncode.com/aligner/index.htm). All cDNA sequences were translated into amino acid sequences using the universal genetic code. The obtained sequences have been submitted to GenBank (Accession Numbers GU121978-GU121983; KJ756506, KJ756507, and KP984527). Multiple nucleotide and amino acid sequence alignments were performed with multiple sequence alignment algorithm MUSCLE (Edgar 2004, part of software Geneious 7.0.3, created by Biomatters). The optimization was based on minimizing the number of indels, by adjusting the codon alignment to the amino acid sequence alignment using the invariant residue positions associated with the globin fold/heme pocket. This optimization was confirmed by the GUIDANCE filter (Penn et al. 2010), and all regions that were not highly supported (low GUIDANCE scores) were removed before subsequent analyses.

Tree reconstruction

A Bayesian reconstruction of the globin tree (Fig. 2) was performed with the software MrBayes (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003) using all the Polynoidae globin sequences obtained and other extracellular and intracellular annelid amino acid globin sequences (Fig. 3). We used the WAG + I + G + F model of amino acid substitutions (ProtTest 3.0, Darriba et al. 2011) run for 4,000,000 generations, sampling every 10,000 generations and using default priors.

Fig. 2
figure 2

Bayesian phylogenetic tree based on annelid globin residues corresponding to the alignment in Fig. 3. The type of each globin sequence is identified in the figure. The magnified area represents the Polynoidae single-domain globins. Posterior probability (PP) values when indicated are near the respective branch or represented as such: ***: ≥0.95, **: ≥0.8, *: ≥0.7. Values below 0.7 were not represented (lowest PP = 0.5). The conserved amino acid residues are indicated in each color-coded group; yellow: all sequences, green: all sequences but Ascaris, Arenicola, Riftia, and Alvinella; salmon: all sequences but sperm whale (Phyca). See Fig. 3 for abbreviations

Fig. 3
figure 3

Alignment of globin sequences from annelids, nematodes, and a vertebrate (sperm whale, in bold). Polynoidae single- and tetra-domain globin sequences are shaded in light gray. Conserved residues are shown in bold (CD1F and F8H), and heme pocket residues that explain the high O2 affinity in Ascaris are shaded in dark gray in the Polynoidae and other species. Cysteines forming an intrachain disulfide bridge in typical extracellular annelid globins (A2C and H10C) are underlined. Arrows indicate the residues under positive selection in Branchypolynoe. Intron (I1 and I2) conserved positions are shown above the sequences. d and p represent the distal and proximal contacts with the heme group, respectively, having the Phyca myoglobin as a reference. Polynoidae sequences: Bsy: B. symmytilida; Bse: B. seepensis; Bseg: B. segonzaci; Btri: B. trifurcus; Bngnov: Branchinotogluma sp. nov.; Brcu: B. cupreus; Lewi: L. williamsae; Eulagisc: Eulagiscinae; Harmoext: H. extenuata. Other globin sequences: Sboa: Sthenelais boa neuroglobin; Aacu: Aphrodite aculeata; Gly: Glycera sp.; Tylo: Tylorhynchus heterochaetus; Lumt: Lumbricus terrestris; Tubifex: Tubifex tubifex; Phese: Pheretima seiboldi; Rifb: Riftia pachyptila HBL-Hb and Riftia: R. pachyptila intracellular globin; Lam: Lamellibrachia sp.; Amarina: Arenicola marina; Alvinella: Alvinella pompejana; Ophelia: Ophelia bicornis; Asuum: Ascaris suum; Omashikoi: Oligobrachia mashikoi; Phyca: Physeter catodon. SD: single-domain; D1-D4: multi-domain globin type; Ng: neuroglobin; Mb: myoglobin; Hb: hemoglobin

A maximum likelihood (ML) tree (Fig. 4) with the single-domain globin sequences from all the polynoid species was constructed using the PhyML package (Guidon and Gascuel 2003) in Geneious 7.0.3 (Biomatters), using the GTR + I + G model (jModelTest 2.0, Darriba et al. 2012) for nucleotide substitution and NNI for topology search. Prior to this analysis, the sequences were analyzed by Gblocks v0.91b (http://molevol.cmima.csic.es/castresana/Gblocks.html) and Gap Strip/Squeeze v2.1.0 (http://www.hiv.lanl.gov/content/sequence/GAPSTREEZE/gap.html) to evaluate which gaps to retain/delete for further analyses. The bootstraps from the trees issued from the output alignments of those programs were considerably lower (data not shown), and we chose to proceed using the initial alignment (Fig. S1). This tree was used as the phylogenetic context for the positive selection analyses (Fig. 4).

Fig. 4
figure 4

Maximum likelihood globin tree (443-bp alignment). Bootstrap values are represented on top of each branch; for each lineage, ω is represented in bold and the ratios indicate the maximum likelihood estimates of the numbers of non-synonymous (d N) over the synonymous (d S) substitutions for the entire globin gene; a, b, and c represent the chosen lineages for the branch-site model test (see results). In relevant clades, amino acids in blue represent the positions correspondent to B10 and E7 (high O2 affinity in Ascaris) and those in red to E11 and F6 (positive selection in the Branchipolynoe branch). Species distribution and important characteristics are represented on the right of the tree. Sboa: Sthenelais boa, Harmoext: Harmothoe extenuata, Lewi: Lepidonotopodium williamsae, Brcu: Branchiplicatus cupreus, Bngnov: Branchinotogluma sp, Btri: Branchinotogluma trifurcus, Bseg: B. segonzaci, Bse: Branchipolynoe seepensis, Bsy: B. symmytilida. SD: single-domain

Positive Selection and Associated Tests (Codeml)

The search for potential positive selection among branches and codon sites was performed by maximum likelihood following the procedure described by Nielsen and Yang (1998), Yang (1998), and Yang and Nielsen (2002) and the PAML program instructions (Codeml).

We used the single-domain globin phylogeny for the Polynoidae species as a framework (Fig. 4), using the Sthenelais boa (Sboa) sequence as an outgroup. We first tested whether the d N/d S (ω) ratios were different among lineages with a likelihood ratio test (LRT = 2Δλ) between the one-ratio branch model (same ω for all branches) and the free-ratio branch model (ω free to vary among branches). The LRT results can be compared to a χ2 distribution, with the number of degrees of freedom equal to the difference in the number of parameters between the two models (Yang 1998). Power and accuracy of the LRT were evaluated by Anisimova et al. (2001), with good results against violation of assumptions. Once the branches with ω values at least twice that of the average value were identified (a possible indication of positive selection), we searched for differences of ω ratio among sites on those specific branches/lineages. Yang and Nielsen (2002) implemented a test that allows the ω ratio to vary both among sites and among lineages (branch-site model). We performed a LRT test comparing MA, a combination of the two-ratio branch model with the positive selection site model (M2a where codons fall in three ω categories (0 < ω < 1, ω = 1, ω > 1), Yang and Nielsen 2002), against the nearly neutral site model (M1a where codons fall in 2 ω categories (0<ω<1, ω=1), Yang and Nielsen 2002). A second test, comparing M1a against a MA with fixed ω2 = 1 (MAω = 1), allowed us to test whether the site variability was actually due to positive selection rather than genetic drift or relaxed selection (Yang and Nielsen 2002; Wong et al. 2004).

Sites under positive selection were identified by a Bayesian analysis, where a posterior probability to belong to a given site class (0 < ω < 1, ω = 1, or ω > 1) is calculated (based on the parameter estimates of the dataset) for each site. By definition, sites under positive selection belong to the site class ω > 1. Only sites with posterior probabilities greater than 95% were considered (Yang 2008). We used the Bayes Empirical Bayes (BEB) test performed by the Codeml package. This method accounts for the sampling errors in maximum likelihood estimates of model parameters (compared to the earlier Naive Empirical Bayes analysis), more adapted for small datasets like ours (Yang et al. 2005).

Ancestral Sequence Reconstruction

Using the same globin phylogeny (Fig. 4) as a reference, the ancestral sequences were reconstructed by Maximum Likelihood based on Bayesian statistics (Koshi and Goldstein 1996; Yang 2008, and the PAML program instructions) through Codeml (model = 0 and NSsites = 0).

Three-Dimensional Modeling of Globins and Localization of Key Amino Acid Replacements

To construct a 3D homology protein model of some of the polynoid globin sequences, we used the tools available on the SWISS-MODEL website (https://swissmodel.expasy.org/interactive), using ProMod3 and MODELLER (Arnold et al. 2006; Biasini et al. 2014; Bordoli et al. 2009). Briefly, this modeling tool allowed us to obtain a 3D model from an amino acid sequence of interest based on the available 3D structure of a PDB template sequence that has the best psi-blast score with our sequence. Atomic energy calculations and minimization of the force fields were optimized.

The product of this rough model was visualized using UCSF Chimera package from the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (Pettersen et al. 2004). The same software was also used to graphically improve the model, to highlight some important residues, and to insert the heme group into the heme pocket of our model. For the insertion of the heme group, we used the coordinates from the template sequence. The analysis of the structural alignment was done using Pymol Molecular Graphics System v1.8.2.1 (DeLano 2008).

Recombinant Globin Expression and Oxygen Binding Properties

The full-length coding sequences of Branchipolynoe symmytilida, Branchinotogluma trifurcus, and the Eulagiscinae globins were cloned into a pET20b vector, preserving the stop codon to prevent fusion with the His-Tag of this vector. Overexpression was performed in BL21 DE3 cells, grown in LB supplemented with ampicillin and in the presence of 1 mM 5-aminolevulinic acid (heme precursor), at 37 °C. After 4 h of induction with 1 mM IPTG, the cells were pelleted by centrifugation, resuspended in a lysis buffer (25 mM Tris/400 mM NaCl, pH 7.5), and lysed with a French press. Cellular debris was eliminated by centrifugation and the globin was purified by size exclusion chromatography from the supernatant onto a Superose 12 column with an elution buffer identical to the lysis buffer.

Oxygen equilibrium curves were obtained with a modified diffusion chamber (Sick and Gersonde 1969) using a step-by-step procedure as previously described (Weber et al. 1976). Briefly, small (4 µl) aliquots of purified recombinant globin solution (~ 0.3 mM heme final concentration) were equilibrated with mixtures of pure N2 and O2 prepared by mass-flow meters, and the resulting variations of absorption spectra were followed at 430 nm with a diode array spectrophotometer (Ocean Optics). The saturation (S) versus PO2 (partial pressure of oxygen) data were linearized according to the Hill equation, log(S/(1 − S)) = f(log PO2), and the values of P50 (PO2 at which the globin is half-saturated with oxygen) and n50 (cooperativity at P50) were derived from linear regressions on the data points between 30 and 70% saturation. The sample pH was adjusted by dilution with a buffer solution of greater strength (500 mM Tris/400 mM NaCl).

Results

Single-Domain gDNA/cDNA Amplification and Sequencing

Coding Sequences

In this study, we produced globin sequences for B. segonzaci, B. trifurcus, Branchiplicatus cupreus, Lepidonotopodium williamsae, a species of Eulagiscinae, and Harmothoe extenuata.

For Branchinotogluma segonzaci, B. trifurcus, Branchiplicatus cupreus, and Lepidonotopodium williamsae, several slightly different cDNA sequences were obtained, indicating either polymorphism at a single coding locus (i.e., alleles) or the presence of different globin loci in these species. For the following analyses, a consensus sequence was produced for all species, considering the most common nucleotides between the sequenced clones and assembling the different parts of the gene where it was possible to align upstream and downstream regions. For B. cupreus, the sequence differences were such (sequence identity of 90.7% between SD1 and SD2) that we likely have two different loci for each species (transition/transversion rate ratio κ = 2.17), and only one sequence was considered for the following analyses.

For Branchipolynoe symmytilida, Branchipolynoe seepensis, B. segonzaci, and B. trifurcus, the complete cDNA sequences from the single-domain globin have a coding sequence of 417 nucleotides including the stop codon. For Branchinotogluma sp. nov. and B. cupreus, we could only amplify 366 bp (122 codons, including the initial methionine) of the coding sequence and 385 for L. williamsae. These partial sequences correspond to the first two exons and most of the third (and last) exon. Finally, for the Eulagiscinae and Harmothoe extenuata, the complete coding sequences comprise 423 and 417 bp, respectively.

Over the shared 354 bp, five indels were found, two common to all Polynoidae species (compared to the Sigalionidae Sthenelais boa), the third present in vent species only, and the last two solely in H. extenuata (Fig. S1). Percentage of nucleotide identity between these single-domain globins is relatively low (37.9%; Fig. S1).

Promoter Regions and UTRs

For B. symmytilida, B. cupreus, and L. williamsae, our sequence covers the full 5′UTR (~ 68 bp), as well as about 440 bp of the promoter region for B. symmytilida and L. williamsae. For B. seepensis, B. trifurcus, and B. segonzaci, we successfully sequenced 48 bp of the 5′UTR (Fig. S2).

For B. symmytilida and L. williamsae, the promoter sequences were slightly more conserved than their coding sequences (77.1 and 75.6% of identical sites, respectively). In both sequences, the TATA box was located ~ 30 bp upstream of the beginning of the 5′UTR (Fig. S2). The identity between the amplified common parts of the 5′UTR (48 bp) for all vent polynoid species was ~ 80%. This value however drops drastically (47.1%) when the 5′UTR of H. extenuata is included (data not shown).

Introns

Introns were successfully amplified and sequenced in all species but the Eulagiscinae, H. extenuata, and intron 2 in B. trifurcus. As reported for B. seepensis and B. symmytilida (Projecto-Garcia et al. 2010), the single-domain genes all exhibit the typical vertebrate globin gene structure with 3 exons separated by 2 introns. The introns are located in the conserved positions B12.2 and G7.0 in reference to the Physeter catodon globin fold.

Intron sequence length differed considerably, especially for intron 1, the length of which ranged from 306 bp in B. symmytilida to 746 bp in B. seepensis. Intron 2 sequence length was also variable but within a more limited range, from 180 bp in L. williamsae to 295 bp in B. seepensis. The alignment between all orthologous intron sequences revealed limited identity (4.9% for intron 1 and 12% for intron 2). Within each genus for which we have two species (i.e., Branchipolynoe and Branchinotogluma), however, the identity is higher (16.2% for intron 1 and 47.8% for intron 2).

Amino Acid Sequences and Protein Structure

The single-domain (SD) sequences obtained here were aligned with other annelid globins (intra- and extracellular), and as a reference we used globin sequences from other representative metazoan groups: invertebrates—two nematode extracellular hemoglobin sequences (Ascaris suum, pig intestinal parasite) and a vertebrate myoglobin from sperm whale (Physeter catodon) (Fig. 2, accession numbers in Fig. 3).

In reference to the Physeter myoglobin fold, the alignment exhibits two conserved residues: a phenylalanine in the CD corner (CD1F) and the proximal histidine on the F helix, to which the heme is bound (F8H). The tryptophan in position A14 was conserved in nearly all globin sequences except for the nematode Ascaris, Arenicola, Riftia, and Alvinella. All sequences also have a conserved tryptophan (H7W) that is not found in the Physeter myoglobin. Although extracellular, the Polynoidae globins do not possess the two well-conserved cysteines involved in a disulfide bridge in the typical extracellular globins from annelids (positions A2 and H10). Over the region for which we have a sequence overlap (118 amino acid residues), the Polynoidae sequences exhibit an amino acid identity of 50%. Several important amino acids in the heme pocket exhibit interesting characteristics. Two important residues that have been identified as key to the very high oxygen affinity in Ascaris Hb, tyrosine B10 and glutamine E7, are also present in S. boa and in all the Polynoidae sequences except the Eulagiscinae, for which the amino acids at both of these positions are replaced by a leucine. The pogonophoran annelid O. mashikoi also possesses a glutamine in E7.

Among the polynoid sequences, out of the 30 probable heme contacts (using the sperm whale myoglobin heme contacts as a reference, Fig. 3), only 11 residue positions are affected by changes.

No signal peptide for protein export was found in any of the species, for which we obtained sequences upstream of the initial methionine.

Single-Domain Globin Relationship with Other Globins

In comparison with the Ascaris and sperm whale globins, the annelid globins segregate into two initial lineages that separate the globins that form the typical extracellular hexagonal bilayer hemoglobins (HBL-Hb) from all other annelid globins (Bayesian phylogenetic tree, Fig. 2). The topology of the clade that comprises intracellular annelid globins and extracellular polynoid globins reflects the current knowledge of annelid phylogeny (Weigert and Bleidorn 2016). The Phyllodocida include all scale-worms (Aphroditidae, Sigalionidae, and Polynoidae) and Glyceridae in our tree. All the Polynoidae sequences group together, regardless of their extracellular or intracellular state.

Variation of d N/d S Ratios Among Branches and Tests for Positive Selection

Variations Among Lineages (Branch Model)

Tests for the past action of positive selection were performed using the maximum likelihood tree topology based on the 443-bp alignment of the globin gene (Fig. 4). From the two different single-domain globins SD1 and SD2 obtained for Branchiplicatus cupreus, only SD1 was used for the following analyses. The same analyses were also performed with SD2 and produced very similar results (data not shown).

The LRT between the one-ratio branch model and the free-ratio branch model was significantly different from zero, indicating that ω (d N/d S) ratios vary among lineages (LRT = 28.98, df = 15, p < 0.025) (Yang 1998). The \(\hat {k}\) values (transition/transversion rate ratio) were very similar between the different models, ranging from 1.66 to 1.71. Under the one-ratio model ω0 is 0.148, indicating an overall moderate purifying selection (Table 2).

Table 2 Codeml parameters obtained under different codon substitution models

Focus on Key Evolutionary Branches (Branch-Site Model)

We searched for signatures of evolutionary change in branches (Fig. 4) that correspond to ecological transitions (littoral vs. deep-sea and deep-sea vs. hydrothermal vents) and anatomical/physiological transitions (absence of gills and multi-domain Hb (hydrothermal vents) vs. the presence of gills and multi-domain Hb).

For all the ecological transitions, ω did not exceed 0.209, suggesting that there was no major non-synonymous substitution accumulation in this protein to adapt its function between littoral environments and deep-sea environments or the hypoxic habitats such as hydrothermal vents (Fig. 4). Two branches (a and b on Fig. 4) exhibit infinite values for ω, as a result of the absence of synonymous substitutions. For both branch a, (genera Branchipolynoe and Branchinotogluma, a lineage that developed gills and multi-domain Hbs) and branch b, we could not find any signature of positive selection (Fig. 4; Table 2).

Branch c, leading to the two species of the genus Branchipolynoe (all commensal species), exhibits a LRT significantly different from zero, indicating that there is a signature of positive selection (Table 2) on this branch. The comparison between M1a and MA showed that the latter best fit the data and additional tests corroborated this result (MA vs. MAω = 1, Table 2). The BEB analysis identified two residues significantly affected by positive selection: 56T (position E11) and 82S (position F6).

Ancestral Globin Reconstruction

These analyses were performed to follow the amino acid substitutions that took place at the nodes of each clade. Overall, the accuracy of the reconstruction had values of posterior probability (PB) for codon change higher than 89%, except for the reconstructed node leading to the outgroup S. boa (~ 66%). This latter node was therefore not taken into consideration. S. boa, H. extenuata, and Eulagiscinae exhibited more amino acid substitutions compared to other sequences (Fig. S3). Interestingly, several residues are shared by the littoral H. extenuata and the deep-sea Eulagiscinae (node PB ~ 91%). These residues are located in the B, D, and G helices and CD and EF corners (Fig. S3). The identity is greater for the species found at hydrothermal vents but the confidence of the reconstruction of this node is below 0.95 (PB ~ 89%). Curiously, the ancestral node corresponding to branch b (PB ~ 95%) seems to be the departure point for several new residues specific to this clade (44S, 49I, 79T, and 116G), with the exception of B. trifurcus (Fig. S3). On the lineage leading to Branchipolynoe (node PB ~ 99%), three residues are uniquely shared (23V, 56T, and 82S) and two of them are the same that were found to be under positive selection (Table 2).

Single-Domain Globin 3D Modeling Approach

Homology models were created only for species for which we had a complete sequence, Branchipolynoe symmytilida, Branchinotogluma trifurcus, and the Eulagiscinae (Fig. 5). For the first species, the automatically chosen PDB template sequence was the monomer chain of the hemoglobin from Lumbricus terrestris (PDB: 1ASH, a high-resolution structure) that had 20% of amino acid identity with our sequences. Although this is close to the ‘twilight zone’ (< 20% of amino acid identity), Pascual-García et al. (2010) showed that if two proteins are known to perform the same function, structural prediction is reliable even below this threshold. For B. trifurcus and the Eulagiscinae, the automatically chosen template with the highest structural identity was the sequence from the monomeric hemoglobin from Glycera dibranchiata (PDB: 1JF4), with 38 and 28% of amino acid identity, respectively.

Fig. 5
figure 5

3D structural model of B. symmytilida (Bsy), B. trifurcus (Btri), and Eulagiscinae single-domain globin. The amino acid residues that are invariant in Fig. 3 in both vent species (B10Y and E7Q) are represented as sticks, residues target of positive selection in Branchipolynoe (E11T and F6S) are represented as rugged spheres (also depicted in the B. trifurcus and Eulagiscinae 3D models), and residues highlighted by the ancestral reconstruction analyses (B7V, E11T, and F6S in Branchipolynoe and D3S/G, E4I, and F3T/N in branch b) are represented as spheres

Positively selected residues in the Branchipolynoe lineage (branch c in Fig. 4) are highlighted on the B. trifurcus and B. symmytilida models for comparison (Fig. 5, dotted residues). In Branchipolynoe spp., E11T (E11V in B. trifurcus) is also located in the distal region of the heme pocket and points in the same direction as E7Q and B10Y (Fig. 5 a and b), therefore potentially affecting ligand binding. The last amino acid under positive selection, F6S, in Branchipolynoe spp. (F6Q in B. trifurcus) is located in a helix region that, in other annelid globins, is important for the formation of oligomers (formation of dimers by interaction of helices E and F (Royer et al. 2001, 2005).

The residues highlighted in branches b and c by the ancestral reconstruction analyses are located in the B, F, and H helices, and in the DE corner. The substitutions in the B helix and DE corner were mostly from polar to non-polar residues (Fig. S4). On the other hand, the substitutions in the F and H helices were from non-polar to polar residues.

Oxygen Binding Properties

The oxygen binding properties of recombinant globins from Branchipolynoe symmytilida, Branchinotogluma trifurcus, and the Eulagiscinae were measured after their overexpression (Table 3). None of the cooperativity coefficients significantly differs from 1, indicating that, if multimers do form, this association does not allow cooperativity. Elution volumes of the different globins on a size exclusion column do not indicate differences of native mass either, suggesting that all globins still remain monomeric (data not shown). As can be expected for globins that lack cooperativity, pH has no significant effect on P50 (data not shown). The two globins with B10Y and E7Q (B. symmytilida and B. trifurcus) both exhibit very similar P50 values that are much lower (i.e., greater affinities) than the globin from the Eulagiscinae (B10L and E7L). Amongst the two former species, the globin from B. trifurcus has a significantly greater affinity (lower P50) than that of B. symmytilida (unpaired t test p = 0.0003).

Table 3 Oxygen binding properties of the different recombinant globins at 15 °C and Ascaris Hb (at 20 °C) for comparison

Discussion

Invertebrate hemoglobins exhibit a great structural and functional diversity (Weber and Vinogradov 2001). This diversity results from an early (i.e., more than 500 Mya) and complex evolutionary history and specific adaptations at the molecular level to contrasted environmental conditions (e.g., levels of oxygen, temperature) and physiological needs. Hydrothermal vents can be very challenging for aerobic organisms, especially in regard to hypoxia and the presence of sulfide (a potent inhibitor of aerobic metabolism) (Carrico et al. 1978; Childress and Fisher 1992). The scale-worm species studied here also have adapted to a wide range of marine conditions and represent a very successful lineage that colonized the hydrothermal vent ecosystem (Fig. 4; Table 1), the usual deep-sea, and the intertidal zone. Such challenging conditions can lead to functional innovations essential for the survival of the species.

Hemoglobin Expression in Vent Species

Endemic hydrothermal vent polynoids typically possess extracellular hemoglobins in their coelomic fluid that confer them their red color (Hourdez, unpublished data). The sheer expression of hemoglobins in deep-sea polynoids can be regarded as an adaptation to hypoxic conditions as these proteins represent a form of oxygen storage that buffers variations of external oxygen concentrations (Hourdez et al. 1999b). It was estimated for Branchipolynoe seepensis that the amount of oxygen bound on hemoglobins could provide about 90-minute worth of aerobic metabolic needs if the worm is exposed to complete anoxia (Hourdez and Lallier 2007). Although extracellular single-domain globins exist in all hydrothermal vent-endemic polynoids, tetra-domain globins were only detected in the genera Branchipolynoe (Hourdez et al. 1999a; Zhang et al. 2017) and Branchinotogluma (Hourdez, unpublished data). The phylogenetic relationships indicate that all the studied polynoid extracellular globins (single- and tetra-domain) derive from a common ancestral gene, which was probably intracellular (Projecto-Garcia et al. 2010, Fig. 2). The extracellular origin of these globins is distinct from the other annelid extracellular globins that diverged from the intracellular ones about 570 million years ago (Goodman et al. 1988).

All the globins sequenced here lack a signal peptide. In Harmothoe extenuata and the Eulagiscinae, this is not surprising because the globin is not free in the coelomic fluid but rather contained in cells (mostly in the nervous system and possibly in muscles). The lack of a signal peptide, although surprising for the vent polynoid species, was already observed in the single- and tetra-domain globin from Branchipolynoe seepensis and B. symmytilida (Projecto-Garcia et al. 2010). In the vent species Lepidonotopodium piscesae, mass spectrometry data indicated a perfect match in molecular mass for both the myoglobin and the hemoglobin found in the coelomic fluid (unpublished data). This observation was used as evidence that the sequenced genes in Branchipolynoe spp. likely correspond to the hemoglobin found in the coelomic fluid and that it is released by holocrine secretion (Projecto-Garcia et al. 2010). The detection of a TATA box 30-base pair upstream of the 5′UTR start position in the promoter supports the absence of alternative splicing variants that would have a signal peptide for excretion.

Interestingly, the 5′UTR and the promoter regions are well conserved in most of the vent species. Although this may indicate some structural or regulatory function(s) for these regions, the physiological relevance of the presence of several regulatory motifs (e.g., CAC binding protein and GATA motifs, data not shown) in SD globins is yet to be ascertained.

Amino Acid Positions Under Positive Selection

The heme pocket of all the polynoid single-domain globin sequences, except the Eulagiscinae, exhibit two conserved amino acid residues that are not under positive selection, B10Y and E7Q. These residues are therefore not recent innovations in the Polynoidae family but could be inherited from ancestral species that evolved under hypoxic conditions. B10Y and E7Q have been shown to be responsible for the very high oxygen affinity of the Ascaris suum globins (pig intestinal parasite), mostly through the low oxygen dissociation rate that they provide (Davenport 1949 in Peterson et al. 1997; De Baere et al. 1994). The replacement of the conserved distal histidine (E7H) by a glutamine (E7Q) and the B10L by a tyrosine (B10Y) seems a common convergent feature in many invertebrate globins (Weber and Vinogradov 2001) and could represent an adaptation to hypoxia. Even so, not all invertebrate globins possess the same high oxygen affinity that is observed in A. suum. The following invertebrate species, in terms of oxygen affinity, have values that represent at least 10 times higher P50 (i.e., lower Hb–O2 affinity) than Ascaris Hb. This property is mostly dependent on the heme pocket conformation (Peterson et al. 1997).

The homology model of the structure of two polynoid globins, B. symmytilida and B. trifurcus, show that the B10Y and E7Q point towards the heme group. It is tempting to suggest that these residues are likely to participate, like in A. suum, on the high oxygen affinity measured in Branchipolynoe for both tetra-domain hemoglobins found in its coelomic fluid (Hourdez et al. 1999b). But such a residue configuration would be expected since the template used for this analysis also had the same residues pointing to the heme group.

However, the data obtained by the functional analyses done with recombinant globins of the vent species show a P50 value 26–32 times lower than that in the Eulagiscinae globin that possesses a leucine at both of these positions. Many other substitutions are found in the Eulagiscinae globin that could participate in the observed difference in affinity, but the two positions discussed have been experimentally shown to most profoundly affect oxygen binding in other invertebrates (extensively reviewed in Weber and Vinogradov 2001). The slight difference between the P50 values of B. symmytilida and B. trifurcus could be due to the sole replacement of a valine by a threonine in the heme pocket (position E11). Although allotropic effects due to amino acid changes elsewhere in the molecule cannot be discounted, the E11 position is the only one position of the distal heme contacts that is different between the two species.

Despite many substitutions, the branches between the littoral species and the deep-sea species do not exhibit any signature of positive selection, suggesting that there is no necessary important change for this protein to function under the high hydrostatic pressure experienced by all the other species in our study. This agrees with the fact that hydrostatic pressure does not induce denaturation or protein structural changes when temperature is constant (Mozhaev et al. 1996), like in deep-sea environments.

In the Branchipolynoe lineage, some important amino acids, 56T (position E11) and 82S (position F6), were found to be under positive selection, suggesting that this lineage experienced a more recent adaptive change. The replacement of 56V for a threonine, a residue similar in size but with a hydroxyl group capable of hydrogen bonds, in the E helix and facing the heme group, could influence O2 binding. The 82S in the F helix, with a smaller side chain than glutamine and a lesser capability of forming bonds, could affect hydrophobicity around it.

Likelihood ratio tests can be especially conservative for small-length proteins (~ 100 codons; Anisimova 2003), close to the ca. 135 codons of globins. This could explain why the residue at the position B7 was not identified as under positive selection, even though B7V is shared in the Branchipolynoe lineage (and found in the Eulagiscinae globin). The substitution from asparagine (position 23), a polar and hydrophilic residue, for a valine, non-polar and with a short side chain, could reinforce the hydrophobic characteristics of the central part of the B helix.

Residues located in B7 and F6 could affect subunit interactions between single-domain globins in Branchipolynoe. The dimer interactions in Lumbricus terrestris hemoglobin are established through residues in the E and F helices (Royer et al. 2000), an interaction in which F6S could participate. In L. terrestris, dimers form tetramers mainly by the interaction of the loop formed by the AB corner. B7V is close to the AB corner and could be involved in interactions to form a multimer. The formation of multimeric assemblages may be beneficial as these hemoglobins are extracellular and larger molecular weight minimizes excretion (Weber and Vinogradov 2001). The absence of differences in native mass (as estimated by the elution volume by size exclusion chromatography) between the recombinant B. symmytilida globin and that of the two other species argues against a difference in polymerization state. The absence of homotropic (cooperativity) or heterotropic (e.g., Bohr effect) characteristics also argues for an absence of polymerization. Even so, other multimeric globins can also exhibit the absence of these same characteristics (Royer et al. 2001), such as Branchipolynoe tetra-domain Hbs (Hourdez et al. 1999b) and Ascaris Hb (Gibson and Smith 1965; Okazaki and Wittenberg 1965).

Positive Selection and Molecular Innovation

The hydrothermal vent scale-worms studied here are all exposed to generally hypoxic conditions (Hourdez and Lallier 2007). As one gets closer to the source of fluid, its proportion in the mix increases, the temperature rises, and the amount of oxygen decreases. The affinity for oxygen of the globins parallels this oxygen gradient, with the highest P50 (i.e., lowest affinity) for the species exposed to the greatest oxygen partial pressure (Eulagiscinae) and the lowest P50 (highest affinity) for the species exposed to the lowest average oxygen partial pressure (B. trifurcus).

Interestingly, the event of positive selection did not take place in any branch representative of major ecological shift. It occurred on the branch that comprises both Branchipolynoe species. In this genus, there are two main tetra-domain hemoglobins in the coelomic fluid, and these exhibit different sensitivities to CO2 (Hourdez et al. 1999b). This is reminiscent of ‘class II’ fish in which hemoglobins found in the erythrocytes have different functional properties and sensitivities to effectors that reflect a division of labor (Weber 2000). In Branchipolynoe, this division of labor may be extended to the single-domain globins, also found in the coelomic fluid. In the coelomic fluid of Branchinotogluma (sister clade of Branchipolynoe), there is only one tetra-domain hemoglobin (Hourdez, unpublished data). The positively selected position in the Branchipolynoe clade could correspond to a consequence of the appearance of the second tetra-domain globin. Species of this genus live inside the mantle cavity of Bathymodiolin mussels where hypoxia can be severe. Females indeed stay within the valves of the host and are quite territorial while they only tolerate mobile ‘dwarf’ males for reproduction (Jollivet et al. 2000). These mussels rely on symbiotic thioautotrophic and/or methanotrophic bacteria for at least part of their nutrition (Childress and Fisher 1992) and flow water laden with sulfide and/or methane to meet their bacteria’s metabolic needs. This hypoxic water however also surrounds all other vent species, with the level of hypoxia depending on the amount of hydrothermal fluid in the mix. When the mussel closes, the worms could be exposed to more severe hypoxic conditions and the modifications found could be involved in dealing with these conditions.

The finding of the absence of positive selection in branches representing ecological shifts could be due to limitations of the method used. Indeed, globins tend to accumulate substitutions at greater rate than other proteins. If an episode of positive selection happened in much deeper branches, the accumulation of mutations since that time could make the detection of the event more difficult. As we move deeper into the phylogeny of these fast-evolving molecules, our confidence in the reconstruction of the ancestral state of each position also decreases greatly and limits our ability to detect older events of positive selection. However, in the tetra-domain hemoglobins from Branchipolynoe, a study showed that the initial domain duplication was accompanied by positive selection on amino acids at the interface between two domains, possibly a response to structural constraints (Projecto-Garcia et al. 2015).