Introduction

Intensified natural selection should occur for new organismal adaptations at molecular and morphological levels in lineages invading new ecological zones (Goodman et al. 1982). The cetacean transition from land to the sea on their early evolutionary history is a great example of that: as a consequence of a lifestyle change, dolphins and whales present several of the most drastic adaptations among mammals (Nery et al. 2013), and in comparison with their terrestrial relatives, they show extreme changes on physiology and morphology (Muizon 2009).

One of the biggest challenges of living exclusively in an aquatic environment is acute tissue hypoxia during diving. Cetaceans have evolved several physiological, anatomical, and behavioral strategies to cope with extended periods of limited O2 availability when submerged (Butler and Jones 1997; Butler 2004). These strategies are well-known and include O2 saving by reduction of the metabolic rate and selective vasoconstriction to assure O2 supply in sensitive organs, among others (for a review, see Ramirez et al. 2007). In particular, one recognized important adaptation for sustained breath-hold diving is the massive over-expression of myoglobin (Mb) in aerobic muscles, a protein that binds and delivers oxygen during extended periods of hypoxia. The increased O2 storing capacity through high Mb concentrations is known to increase the aerobic dive limit of animals and in cetaceans it seems to be a limiting factor to diving (Dolar et al. 1999; Kooyman and Ponganis 1998; Polasek and Davis 2001). In addition, there is a recent growing appreciation that myoglobin plays a role as a nitric oxide producer during hypoxia, which has important consequences for the regulation of metabolism and blood flow in a variety of tissues in addition to muscle (see Cossins and Berenbrink 2008; Hendgen-Cotta et al. 2008; Hendgen-Cotta et al. 2010). Nitric oxide limits the tissue damage caused by restricted blood flow, which is an especially important role to a group that use selective vasoconstriction during prolonged dives.

Oxygen transport proteins such as myoglobin are interesting systems to study the relationship between environmental conditions and molecular adaptation, since they have developed complex regulatory molecular mechanisms to optimize the oxygenation-deoxygenation cycle according to the physiological needs of a given species in a given habitat (Di Prisco et al. 1991). This characteristic makes them a likely target of natural selection in organisms living in extreme environments such as those characterized by periods of hypoxia or ischemic conditions. With regards to cetaceans, it is tempting to think that evolution of subaquatic behavior would be associated with changes in Mb physicochemical characteristics—such as a molecule with a lower oxygen affinity and a higher Bohr effect or a molecule with higher stability—which in turn, would directly translate to benefit the whole organism diving performance.

Considering the great importance of Mb on cetacean’s life, Mb is a great candidate gene potentially subjected to positive selection in cetacean lineage. Accordingly, using an evolutionary comparative framework, the main objectives of this study were: (1) to analyze the evolutionary pattern of the Mb gene in the cetacean lineage and among cetacean families that differ in their ability to dive, and (2) to detect positive selection at amino acid level of this protein.

Materials and Methods

Taxonomic Sampling

We obtained DNA sequences of myoglobin for ten cetacean species (Sei whale, Balaenoptera borealis; Bryde whale, Balaenoptera edeni; Minke whale, Balaenoptera acutorostrata; Melon-headed whale, Peponocephala electra; Pygmy sperm whale, Kogia breviceps; Sperm whale, Physeter catodon; Indo-pacific beaked whale, Indopacetus pacificus; Stejneger’s beaked whale, Mesoplodon stejnegeri; Bottlenose dolphin, Tursiops truncatus; Spotted dolphin, Stenella attenuata), representing four different families: Balaenopteridae, Physeteridae, Ziphiidae, and Delphinidae. Among these morphologically and physiologically diverse cetaceans, the family Ziphiidae (beaked whales) and Physeteridae (sperm whales) are known as the best divers of all marine mammals (Tyack et al. 2006; Watwood et al. 2006; Baird et al. 2008). While these groups do not conform a monophyletic group, we will still classify them as “long-diving” groups. In order to attain a broad and balanced taxonomic coverage, we also obtained myoglobin sequences from 18 terrestrial mammals, representing the superorders Laurasiatheria (Cow, Bos taurus; Sheep, Ovis aries; Pig, Sus scrofa; Dog, Canis familiaris; Horse, Equus caballus; Microbat, Myotis lucifugus; Hedgehog, Erinaceus europaeus), Euarchontoglires (Mouse, Mus musculus; Rat, Rattus norvergicus; Mole rat, Spalax carmeli; Black-lipped pika, Ochotona curzoniae; Orangutan, Pongo abelii; Macaque, Macaca mulatta; Chimpanzee, Pan troglodytes; Human, Homo sapiens) and Afrotheria (Rock hyrax, Procavia capensis), the infraclass Marsupialia (Tasmanian devil, Sarcophilus harrisii) and the order Monotremata (Platypus, Ornithorhynchus anatinus). The tree topology used for the myoglobin analysis is depicted in Fig. 1. All sequences were obtained from GenBank and Ensembl databases (Supplementary Table 1). Nucleotide alignments were performed using MUSCLE (Edgar 2004).

Fig. 1
figure 1

The tree topology used to conduct the analyses of variable ω among lineages. This tree is based on published literature (Agnarson and May-Collado 2008; Hallström and Janke 2008; Nery et al. 2012)

Natural Selection Analysis

To investigate the possible role of positive Darwinian selection on the evolution of the myoglobin gene in cetacean clades with contrasting diving abilities we explored variation in ω, the ratio of the rate of non-synonymous substitutions (d N) to the rate of synonymous substitutions (d S), in a maximum likelihood framework using the codeml program from PAML v4.4 (Yang 2007). We compared two sets of models, the first set focused on comparing changes in ω (=d N/d S) along the branches of the tree (branch-models), and the second set of models focused on comparing changes in ω along the different sites in the alignment between background and foreground set of branches (branch-site models).

We implemented the following branch-models: (1) the one-ratio model that assumes the same ω ratio for all branches, (2) a free-ratio model that assumes an independent ω ratio for each branch, and (3) various intermediate models that allow us to estimate ω values on lineages of our interest. The intermediate models were designed to investigate variation in ω among cetacean clades with contrasting diving abilities. We implemented the following models: (i) a 2ω model, which assigned one ω for all non-cetacean branches and another for the ancestral and descendant branches of all cetacean families (Fig. 2a); (ii) a 3ω model, which assigned one ω for the ancestral and descendant branches of the “long-diving” families (Physeteridae and Ziphiidae), a second ω for the ancestral and descendant branches of the other two cetacean families (Balaenopteridae and Delphinidae), and a third ω value for all non-cetacean branches (Fig. 2b); (iii) a 5ω model, which estimated independent ω values for the ancestral and descendant branches of the “long-diving” families (Physeteridae and Ziphiidae), for the ancestral and descendant branches of the other two cetaceans families, and a fifth ω value for all non-cetaceans mammals (Fig. 2c), and (iv) the 9ω model, which estimated independent ω values for the ancestral and descendant branches of each cetacean family included in the tree (Balaenopteridae, Physeteridae, Delphinidae, and Ziphiidae) and a ninth ω value for all non-cetaceans branches (Fig. 2d).

Fig. 2
figure 2

Graphical representation of nested models implemented in the myoglobin analyses. The numbers in the figure represent different values of ω. a A 2ω model, which assigned one ω for all non-cetacean branches and another for the ancestral and descendant branches of all cetacean families. b A 3ω model, which assigned one ω for the ancestral and descendant branches of the “long-diving” families (Physeteridae and Ziphiidae), a second ω for the ancestral and descendant branches of the other two cetacean families (Balaenopteridae and Delphinidae), and a third ω value for all non-cetacean branches. c A 5ω model, which estimated independent ω values for the ancestral and descendant branches of the “long-diving” families (Physeteridae and Ziphiidae), for the ancestral and descendant branches of the other two cetaceans families, and a fifth ω value for all non-cetaceans mammals, and d the 9ω model, which estimated independent ω values for the ancestral and descendant branches of each cetacean family included in the tree (Balaenopteridae, Physeteridae, Delphinidae, and Ziphiidae) and a ninth ω value for all non-cetaceans branches. The shaded model represents the model that best fitted the data

We also applied branch-site models, which explored changes in ω for a set of sites in a specific branch of the tree to assess changes in their selective regime (Yang and Dos Reis 2011), and also to test whether they share common sites under positive selection that could have evolved convergently, and account for their great diving capacity. In this case the ancestral branches of the Physeteridae and Ziphiidae families were labeled as foreground branches in two independent runs. We compared the modified model A (Zhang et al. 2005), in which some sites are allowed to change to an ω > 1 in the foreground branch, with the corresponding null hypothesis of neutral evolution. In all cases, three omega starting values (0.5, 1, and 2) were used to check the existence of multiple local optima. All nested models were compared using the likelihood tests (LRTs).

Detecting Changes in Physicochemical Properties of Amino Acids

To detect significant physicochemical amino acid changes among residues in myoglobin genes we used the algorithm implemented in the TreeSAAP 3.2 software package (Woolley et al. 2003). TreeSAAP compares the magnitude of property changes of non-synonymous residues across a phylogeny and indicates which amino acid properties have likely been affected by positive destabilizing selection, during the evolutionary process. Within TreeSAAP, the magnitudes of non-synonymous changes are classified into eight categories according to the change in specific physicochemical properties, from conservative (1–3) to very radical substitutions (6–8). For each category, a z-score was calculated. Significant positive z-scores indicate that a given region is under influence of positive selection (i.e., the number of inferred amino acid replacements significantly exceed the number of expected by chance). For our purposes, only amino acid properties identified by significant positive z-scores in categories 6–8 (i.e., the most extreme categories of structural or functional changes) were considered to be affected by positive destabilizing selection. The identified properties were then subjected to a sliding window analysis with the aim to verify which specific regions in the protein differ significantly from a neutral model. Sliding windows of 10, 15, 20, and 30 codons in width were performed to determine the range that maximizes the signal. We also identified particular amino acid residues within each region identified as being under positive selection for each property.

Results and Discussion

Variation in Omega Value

To detect the possible role of positive selection on the evolution of the myoglobin gene of cetaceans, we implemented several branch-models with an ecological argument using a maximum likelihood approach as implemented in the program PAML. The one-ratio model assumes the same ω ratio for all lineages. The log-likelihood value under this model was 0 = −3711.43, with ω = 0.08. The free-ratio model, which assumes an independent ω ratio for each branch on the tree, was applied to the same data. The likelihood value under this model was 1 = −3654.95, with ω ratios variable among lineages. We then compared these two models using a likelihood ratio test to test whether the free-ratio model provided a significantly better fit to the data. The difference between models was LRT = 2Δ = 2 × ( −  0) = 112.96, and the χ2 distribution (df = 51) suggests the rejection of the one-ratio model (P = 10−5), indicating that ω rates are indeed different among lineages. Since the free model fits better to the data, intermediate models were implemented to investigate the potential role of positive selection in the cetacean lineages (Fig. 2). The 2ω model (Fig. 2a) that distinguishes between cetaceans and non-cetaceans had a significantly better fit (LRT = 23.26, P < 10−5) than the one-ratio model. The ω value estimated for cetaceans was 0.25, more than three times higher the value estimated for non-cetacean mammals (ω = 0.07) (Table 1). Values of the parameters, LRT, and P values estimated from the different models are summarized in Table 1.

Table 1 Log likelihood and omega values estimated under different lineage models for myoglobin

Our results revealed that the selective pressure is variable among mammalian lineages. Although all ω estimates were less than 1, cetacean branches have more than three times higher ω ratios than all other mammals. We thus argue, using a phylogenetic criterion, i.e., the ratio between the ω value between the group of interest and a control group, for a strong event. While there were many cases in which the statistical criterion of ω > 1 is not reached, this alternative view for estimating positive selection has been used in the literature (e.g., Opazo et al. 2005; Shen et al. 2010; Tomasco and Lessa 2011), and we argue that there are strong biological arguments to claim an event of positive selection.

Although the model that better fit the data was the model that distinguishes between cetaceans and non-cetaceans (Fig. 2a), a carefully inspection of the ω values estimated under the other branch-models reveals interesting patterns (Table 1). For example, under the 3ω model (Fig.  2b) long-diving families (Physeteridae and Ziphiidae) had greater ω values in comparison to the other two cetacean families, and are four times higher than non-cetaceans (Table 1). When ω values were independently estimated for ancestral and descendant branches (5ω and 9ω models; Fig. 2c, d), the ω values for the ancestral branches were always higher in comparison to the ω values for the descendant branches, and this pattern is more conspicuous in the clade that contains species of the family Ziphiidae (Table 1). This molecular evolutionary pattern has been previously described in the literature and is called “positive selection causes purifying selection,” in which in the first stages of functional differentiation the amino acid changes that remodel the molecule could be driven to fixation by positive Darwinian selection, while at later stages, the previously fixed changes would be preserved by purifying selection so as to maintain the newly acquired physiological role (Goodman et al. 1982).

A great challenge to detect signatures of natural selection is determining whether an increase in the ω value is due to positive selection or to the confounding effects of population demographic history (such as a decrease in population size) (Nikolaev et al. 2007, Popadin et al. 2007). These effects can be distinguished because changes in demographic aspects are expected to affect all genes along a lineage producing a genome-wide effect on the ω ratio, whereas a change in selection intensity should not affect all genes to the same extent but instead can be different for different genes (Fay and Wu 2003). One possible way to circumvent this problem is to compare the variation of ω value in myoglobin to those reported in a recent genome-scan study in the bottlenose dolphin. McGowen et al. (2012) compared approximately 10,000 protein-coding genes from the bottlenose dolphin and other mammalian genomes, and they estimated a mean d N /d S value of 0.15 for the non-cetacean genomes (they included cow, horse, and dog). In our study, the myoglobin shows an average d N /d S value of 0.07 in non-cetacean mammals, which is indicative of a relatively strong level of functional constraint on this gene. Moreover, McGowen et al. (2012) reported a mean d N /d S value of 0.19 for the cetacean genome, and in our study, the cetacean myoglobin d N /d S value is above this estimate (0.25). This comparison shows that the myoglobin pattern departs from the mean d N /d S value estimated for the cetacean genome, suggesting that the increase in the ω value should be due to natural selection, and not just a variation in population size.

The analyses discussed above assume that amino acid sites evolve under the same selection pressure. This assumption is unrealistic since positive selection often acts on a few sites and in a short period of evolutionary time and the signal may be swamped by the ubiquitous purifying selection (Golding and Dean 1998). It means that although the overall ω estimate does not show a signal of positive selection, it is possible that few sites may be under positive selection.

We also explored the variation of evolutionary rates in specific amino acid sites in “long-diving” lineages. To do this we used branch-site models, in which our phylogenetic tree was divided into the foreground (the last common ancestor branch of Ziphiidae, and last common ancestor branch of Physeteridae) and background branches (all other branches on the tree). When we labeled a foreground branch as the last common ancestor of Ziphiidae, the comparison indicates that the model that can estimate a class of sites with an ω value greater than 1 did not have a significantly better fit than the null model in which the ω value was fixed to 1 (data not shown). The same result was obtained when the last common ancestor of the family Physeteridae was labeled as foreground branch.

Although no statistical evidence of positive selection was obtained in this case, the myoglobin sequence alignment reveals specific substitutions in both long-diving groups (Fig. 3). For example, both Physeteridae and Ziphiidae families presents two parallel substitutions: on site 4 (Asp→Glu; both negative/polar and relatively small) that is known to be involved in the Mb salt bridges, which play an important role in molecular folding and thus in the stability of its secondary and tertiary structure (Romero-Herrera et al. 1979; Lambright et al. 1989), and on site 12 (Asn→His; neutral polar and small to positive/polar and large). In the first case the replacement correspond to a conservative change; however, the second one involves changes both in charge and volume. Besides those two substitutions, there are group-specific substitutions. For the Physeteridae family the site 45 presents an arginine, whereas all other species have a lysine. These amino acids have the same physicochemical properties (positive/polar and large), and this change appears to be neutral. This site is known to be part of the primary pathway of ligand binding in Mb (Huang and Boxer 1994; Lambright et al. 1994). Other amino acid substitution restricted to Physeteridae is on site 151 (Phe→Tyr; neutral/nonpolar and relatively large to neutral/polar and small). Substitutions limited to Ziphiidae family are on sites 22 (Ala→Ser; neutral/nonpolar and small to neutral/polar and large), 27 (Asp→Glu; both negative/polar and small), 118 (Arg→Lys; both positive/polar and large), and 121 (Gly/Ala→Ser; neutral/polar and non-polar and small to neutral/polar and small). Because several of the substitutions observed in our study are different from those previously studied, the impact of these changes cannot be predicted. Although most of the changes are conservative, those that are not conform good candidates for future site-directed mutagenesis studies.

Fig. 3
figure 3

Phylogeny of cetaceans depicting specific amino acid substitutions in the myoglobin protein in “long-diving whales” (families Ziphiidae and Physeteridae). Changes in bold correspond to parallel amino acid substitutions in both lineages

Changes to Amino Acid Physicochemical Properties

Amino acid substitutions have a wide range of effects on a protein depending on the difference in physicochemical properties, and their location on the protein structure. Our TreeSAAP analysis identified four significant physicochemical amino acid changes among residues in myoglobin protein: turn tendencies, alpha-helical tendencies, coil tendencies, and equilibrium constant (Table 2). Evaluation of several sliding window sizes showed that a width of 15 codons maximized the positive selection signal (Fig. 4). Regarding the amino acid property “alpha-helical tendencies,” it is known that a reduction of this property would allow for a more flexible alpha helix, which could increase accessibility at proteins surface. In contrast, an increasing alpha-helical tendency would result in a more rigid alpha helix, resulting in more stable lipid raft composition (Burkin et al. 2000). The properties “turn tendencies” and “coil tendencies” are related to structural aspects and refer to the ability of an amino acid to contribute to or initiate a coil and a turn, respectively, in a protein (Charton and Charton 1983; Gromiha and Ponnuswamy 1993).

Table 2 Physicochemical properties under positive destabilizing selection in myoglobin
Fig. 4
figure 4

Sliding window plots of the z-scores of radically changed amino acid properties showing protein regions under positive destabilizing selection in myoglobin sequence

Concluding Remarks

Accelerated evolutionary rate in a protein related to intracellular oxygen homeostasis was associated with cetaceans, a group of mammals that secondarily returned to the sea. Our analyses reveal the occurrence of parallel and group-specific amino acid changes, as well as signals of positive destabilizing selection in physicochemical properties of myoglobin. Taken together, these results suggest that the conquest of the sea created a selective regime of positive selection in early cetacean evolutionary history, and that the differences in evolutionary rates of myoglobin in the cetacean lineage would reflect a change in function associated with new demands inflicted by the new environment.

Very recently, two studies were published regarding the myoglobin evolution focusing on cetaceans. Similarly to our results, Dasmeh et al. (2013) found that the evolutionary rate is substantially higher in cetaceans myoglobin compared to terrestrial mammals, and identified several mutations that overall contribute to a higher protein stability. Furthermore they state that this phenotype of increased folding stability was first established on the cetacean ancestor and maintained throughout the cetacean lineages (Dasmeh et al. 2013). Interestingly, they found that three of the four specific substitutions in the Ziphiidae family (on sites 27, 118, and 121, already discussed earlier in this paper) were key mutations for increasing protein stability. The authors argue that folding stability could be selected in response to speciation in a new habitat. The second study (Mirceta et al. 2013) modeled the evolutionary history of myoglobin to elucidate the development of maximal diving capacity during the transition from a terrestrial to an aquatic habitat on the evolution of independent mammalian lineages. They revealed an adaptive molecular signature of elevated myoglobin net surface charge in all lineages of mammalian divers. The authors suggest that the convergent evolution of high myoglobin net surface charge in mammalian divers increases intermolecular electrostatic repulsion, permitting higher muscle oxygen storage capacities (Mirceta et al. 2013). These two recent studies together with our results provide a new perspective on the evolution of myoglobin, and highlight the important role that this protein had on the major transition land-to-water on the early evolutionary history of cetaceans.

It is relevant mentioning that inferences about functional divergence based on ω variability cannot substitute the study of biochemical/physiological properties of proteins, but the comparison of variable ratios can provide an informative step toward understanding functional divergence since changes in evolutionary rate can be interpreted as changes in functional constraints (Naylor and Gerstein 2000). Considering the limited information available regarding physiology in cetaceans, this information can be used to guide the choice of functionally important candidate sites for subsequent experimental trials to test whether or not myoglobin functions in cetaceans changed as a consequence of the transition from terrestrial to a fully aquatic life form.