Introduction

Cetaceans (whales, porpoises, and dolphins) and pinnipeds (seals and sea lions) are aquatic mammals that evolved from different terrestrial ancestors about 50 and 30 million years ago (Ma), respectively (Fordyce 2018; Berta et al. 2018). However, cetaceans are exclusively aquatic while pinnipeds use both land and sea. Despite their differences in dependence on the aquatic environment, these groups share several morphological adaptations for underwater life. These include a streamlined body shape, modification of forelimbs into flippers, and a thick layer of fat for thermoregulation (Davis 2019). One striking characteristic of both cetaceans and pinnipeds is their diving behavior, which also demands many specific adaptations, especially in the physiology and metabolic homeostasis of hypoxia (Davis 2014; Blix 2018; Ponganis 2019).

Hypoxia is a condition where the concentrations of oxygen (O2) available for tissue usage are low, occurring in an acute form during the apnea diving periods (Ramirez et al. 2007). Among the adaptations to deal with extreme hypoxia is the vasoconstriction of peripheral tissues that redistributes blood flow to oxygen-dependent tissues, and bradycardia, which is the reduction of heart rate to maintain blood pressure (Zapol et al. 1979; McDonald and Ponganis 2013; Davis 2014). One of the main challenges of vasoconstriction is the potential for ischemia, where limited blood flow can lead to reperfusion injury, which occurs when blood flow is restored to ischemic tissue, leading to the sudden production of reactive oxygen species (ROS), such as superoxide (O2) and hydrogen peroxides (H2O2) (Murphy 2009; Kalogeris et al. 2012; Hancock 2021). In normal physiological conditions, ROS plays an important role in cell signaling and metabolism. However, excessive ROS production during reperfusion, mostly from the xanthine dehydrogenase (XDH) pathway, can cause oxidative stress, leading to cellular damage (Jones 2006; Halliwell and Gutteridge 2015). This increase in ROS production is observed in aquatic mammal tissues, nonetheless, they do not present higher oxidative damage than terrestrial ones (Elsner et al. 1998; Zenteno-Savín et al. 2002; Wilhelm Filho et al. 2002).

The remarkable tolerance to oxidative stress seen in aquatic mammals is linked to their antioxidant enzyme system. This system consists of a set of proteins with catalytic sites that stabilize and remove ROS from their tissues (Halliwell and Gutteridge 2015). The main antioxidant enzymes include catalase (CAT), glutathione peroxidase (GPX), superoxide dismutase (SOD), and peroxiredoxins (PRDX). Several studies have investigated the activity of these enzymes by comparing the tissues of aquatic and terrestrial mammals, revealing the important role of these enzymes in the protection against oxidative stress (Zenteno-Savín et al. 2011; Vázquez-Medina et al. 2012; Allen and Vázquez-Medina 2019). Furthermore, the activity and expression of these enzymes vary between cetaceans and pinnipeds, suggesting that lineages may have unique responses to oxidative stress (Wilhelm Filho et al. 2002; Cantú-Medellín et al. 2011; Geßner et al. 2022). These variations may be related to specific challenges, such as different depths and duration of dives, leading to specific adaptations in their antioxidant enzyme system.

The metabolism of aquatic mammals, including the antioxidant system, has been studied for many years from a metabolic perspective to better understand how it adjusts to a drastic change in the aquatic environment. With the advance of genomic sequencing and analysis, the availability of more mammalian genomes has grown and many genes related to hypoxia adaptations have been found to be positively selected in cetacean lineages (McGowen et al. 2014; Nery et al. 2013; Tian et al. 2016; Cabrera et al. 2021). Also, convergence studies between cetaceans and pinnipeds have proven to be a valuable approach, because, despite differences in habitat and diving behavior, both groups have been subjected to similar evolutionary pressures in their management of oxygen (Zhou et al. 2015; Yuan et al. 2021). The antioxidant enzymes also have been targeted by genomic studies (Tian et al. 2018, 2019); however, many of them use a limited number of species and focus on cetaceans groups, lacking a more comprehensive and comparative approach between the Cetacea and Pinnipedia lineages, as well as a comparison between the evolution of the antioxidant system as a whole.

In this context, to better understand the adaptation of aquatic mammals to oxidative stress, our objective was to investigate the molecular evolution of genes involved in oxidative stress derived from hypoxia in breath-hold dives. We estimated rates and identified sites under positive selection specific to each group of aquatic mammals in some of the antioxidant enzymes, responsible for removing ROS, and one of the producers of ROS, the XDH enzyme. Additionally, we identify signs of positively selected convergent mutation in two deep-diving species, the Cuvier’s beaked whale and the Southern elephant seal. Finally, we inferred the possible impacts of these selected mutations on overall protein structure and function.

Material and Methods

Genomic Data, Sequences, and Alignment

To obtain the coding sequences (CDS) of the six antioxidant genes (CAT, GPX3, GSR, SOD1, PRDX1, and PRDX3) and the xanthine dehydrogenase (XDH) gene, we retrieved the longest transcripts of these genes available from the NCBI database (https://www.ncbi.nlm.nih.gov/) for 72 species of mammals (Supplementary Table S1). In 9 target species, annotated sequences were missing or incomplete, so we used the CDS retrieved from a closely related species as a query sequence in the NCBI BLAST tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi) searching directly in the genome assemblies of target species (Supplementary Table S2.1). The criteria of E-value < 0.05 and percent of identity > 90% were used to select the best results (Supplementary Table S2.2). Sometimes, because some of the genomes used as targets were fragmented, the sequence fragments returned by BLAST needed to be mapped to the query species gene, generating a concise sequence for our target species.

Nucleotide and amino acid sequences were aligned using the MUSCLE tool (Edgar 2004). Alignments were curated by removing or replacing incomplete sequences and tidying up mismatched regions. To obtain the codon alignments used in the evolutionary analyses, we used PRANK v.170427 (Löytynoja 2014).

Phylogenetic Inference

We reconstructed the evolutionary history of the antioxidant genes by performing a phylogenetic analysis of both nucleotide and amino acid alignments. Maximum likelihood (ML) phylogenetic reconstruction was performed using IQ-TREE v. 1.6.2 (Nguyen et al. 2015) with 1,000 bootstrap replicates to estimate branch confidence (Hoang et al. 2017), where nodes with support values ≥ 80 were considered robust. For Bayesian inference of nucleotide trees, we used PARTITION FINDER v. 2.1.1 (Lanfear et al. 2017) to estimate the partitioned evolutionary models using the Bayesian Information Criterion (BIC) (Supplementary Table S3). The models inferred for each gene were used in MRBAYES v. 3.2.7a (Ronquist and Huelsenbeck 2003), where the posterior probabilities of the nodes were calculated applying Markov chain Monte Carlo (MCMC) running for 10,000,000 generations with four chains, and trees were sampled every 100 generations.

Natural Selection Analyses

We used the codeml program in the PAML v. 4.4 package (Yang 2007) to estimate the evolutionary rates of genes under a maximum likelihood framework (MLE) (Goldman and Yang 1994). Codon substitution models were applied to estimate the rates of non-synonymous (dN) and synonymous (dS) substitutions to detect positive selection. The omega ratio—ω (dN/dS) is the measure used to infer the strength of the selection of codon changes, in which ω > 1 indicates positive selection, ω < 1 indicates purifying selection, and ω = 1 indicates neutral evolution. We used the codon alignments retrieved by PRANK and a species tree based on a mammalian phylogeny (Upham et al. 2019) available in VertLife (https://vertlife.org/).

We employed three different methods in codeml to detect positive selection in our data: branch models (BM), branch-site models (BSM), and site models (SM) (Yang 1998; Yang and Nielsen 2002; Yang et al. 2000).

To detect selection in specific lineages, we used BM, which estimates the dN/dS for all branches in the tree as a null model or estimates separated ratios for foreground lineages and the background. We employed five branch models (Fig. 1): (a) one-ratio model (1ω), the null model; (b) two-ratio model (2ω), estimating one ω value for all the lineages of aquatic mammals (Cetacea + Pinnipedia), as foreground, and one for all the terrestrial mammals, as background branches; (c) three-ratio model (3ω), distinguishing the clades within aquatic mammals in Cetacea and Pinnipedia; (d) five-ratio model (5ω), separating the ancestral lineage from its recent lineages inside each aquatic mammal group; and (e) seven-ratio model (7ω), separating the recent groups of Cetacea (Mysticeti and Odontoceti) and Pinnipedia (Phocidae and Otariidae + Odobenidae). In each model, ω values were estimated for the branches previously marked in the phylogeny based on the species evolution.

Fig. 1
figure 1

Omega values estimated in the branch model test with the species tree used in codeml analysis with the foreground branches marked for the a 2ω, b 3ω, c 5ω, and d 7ω tests. Species tree with the major groups of mammals is represented with the lineages used as foreground branches in each test contrasting colored. Triangles at the tips represent a whole clade and lines represent ancestral lineages leading to them. In bold are the genes with significant LRT values in the most specific branch model. Rodents clade includes the Logomorpha species

To detect positive selection acting on a few sites along specific lineages, we used the BSM analysis. This method sorts the sites in ω classes and only allows ω2 > 1 in foreground branches (Yang and Nielsen 2002). We applied the modified version (Zhang et al. 2005) using Model A as the alternative model, where there are four classes of ω and ω2 > 1 is only allowed in the foreground branches. We performed three tests: (a) convergence, with Cetacea and Pinnipedia as foreground branches; (b) Cetacea, where only the group of cetaceans were marked as foreground; and (c) Pinnipedia, with only pinnipeds as foreground (Fig. 2). As a comparative test, the same analysis was performed using close relative groups: Artiodactyla, Carnivores (excluding Pinnipedia), and both groups were marked together for convergence.

Fig. 2
figure 2

Comparison of protein residues in which positive selection was identified in the BSM and complementary analysis. The residues were obtained from the protein alignment from each gene, in which it is possible to compare the amino acids from terrestrial mammals and substitutions happening in the aquatic mammals’ lineages. Specie trees are provided showing all the species used in our analysis and the lineages of interest used as foreground in each test are highlighted using different colors and shapes. Purple circle on the left: convergence test with Cetacea + Pinnipedia marked; Blue triangle on the right: Cetacea test; Green rectangle on the right: Pinnipedia test

Finally, we performed SM to identify specific codon sites that may be under positive selection, regardless of the lineage. In this model, each site in the alignment is distributed into different ω classes, and no foreground or background lineages are designated (Yang et al. 2000). We used the positive selection test, comparing the null model M7beta with the alternative model M8beta&ω (Supplementary Table S4).

We also performed selection analyses using the HyPhy package (Pond et al. 2005; Kosakovsky et al. 2020a). For branch-site tests, we used aBSREL and BUSTED, which estimate ω values in each branch of the tree. While aBSREL estimates the proportion of sites under positive selection in the tested lineages (Smith et al. 2015), BUSTED is a general approach for identifying gene-wide positive selection in the tested lineage (Murrell et al. 2015). We also used RELAX to calculate the strength of selection (i.e., whether it was intensified or relaxed) in our lineages of interest (Wertheim et al. 2015).

We used contrast-FEL to test sites with different evolutionary rates between two subsets of a phylogeny evolving under different environmental conditions (Kosakovsky et al. 2020b). We also used site models like FEL (Fixed Effects Likelihood), SLAC (Single-Likelihood Ancestor Counting), and FUBAR (Fast, Unconstrained Bayesian AppRoximation), which identify PSS in a phylogeny using maximum likelihood, counting approaches, and Bayesian approaches, respectively (Kosakovsky et al 2005; Murrell et al. 2013). These tests, together with SM in codeml, were used as complementary tests for sites identified in BSM models (Supplementary Table S5.1 and S5.2).

Statistical Analysis

For the codeml models, to test whether dN was significantly higher than dS, we used a likelihood ratio test (LRT) comparing the likelihoods of the null (l0) and alternative (l1) models in the Eq. 2 X (l1 − l0) (Yang and Nielsen 1998; Yang 1998). The LRT results follow a χ2 distribution, and values with p < 0.05 were considered significant, supporting the alternative hypotheses.

Although BSM and SM estimate ω for site lineage and site only, respectively, they use the same statistical method to infer the posterior probability (pp) of each site in a ω > 1 class, the Bayes empirical Bayes (BEB). Sites with significant LRT and pp > 0.9 are considered positively selected (Nielsen and Yang 1998); however, we considered only the results of BSM to infer positively selected sites (PSS) in our lineages of interest, and SM was used as a complementary test, once it does not have a lineage specificity.

Amino Acid Properties and Protein Structure

We used the program TreeSAAP v.3.2 (Woolley et al. 2003) to evaluate the physical–chemical changes induced by non-synonymous substitutions in protein residues within a phylogenetic context. The program categorizes these amino acid mutations into eight groups based on the magnitude of their physical–chemical effects, ranging from mild to radical substitutions. We used a goodness-of-fit test, which yields a z-score, to determine whether these mutations were under positive selection. The results were treated in IMPACT-S v. 1.0.0 (Maldonado et al. 2014), generating a table with “Property by Site” where only sites with significant z-scores and substitutions in the categories 6–8 were considered as being under positive destabilizing selection.

To assess the potential impact of positively selected sites on the protein structure and function, we employed a combination of bioinformatic tools and structural modeling techniques. First, we used AlphaFold Database (Varadi et al. 2022; Jumper et al. 2021) and AlphaFold Colab (AlphaFold v2.3.1.) in the UCSF ChimeraX v. 1.5 software (Goddard et al. 2018; Pettersen et al. 2021) to model the protein structures of the target species and their close relatives (Bos taurus for Cetacea and Canis lupus familiaris or Felis catus for Pinnipedia). We then compared the regions of the protein with high confidence (pLDDT > 90) to identify potential changes induced by positively selected mutations (Supplementary Figs S22–S28).

To further investigate the functional impact of the positively selected sites, we checked if they had known effects on human proteins. Using PolyPhen-2 (Adzhubei et al. 2010) based on the HumDiv dataset, we compared the wild-type properties of the amino acids with the mutant properties found in human variants that harbored the same mutations as the positively selected sites. PolyPhen-2 employs a Naive Bayes posterior probability score to estimate the damaging effect of amino acid replacements, classifying them into three categories of effect in the protein: benign, possibly damaging, and probably damaging. Finally, to gain additional insights into the function and evolution of the proteins, we retrieved additional information from the UniProt (https://www.uniprot.org/). We visualized the protein models using the UCSF ChimeraX v.1.5 software.

Results

Dataset and Phylogenetic Reconstruction

We retrieved the longest and most complete transcript for each gene for 81 species from 15 mammalian orders using annotated CDS sequences and BLAST tool search (Supplementary Table S6). There was a variable number of represented species for each gene: 77 for CAT, 78 for GPX3 and PRDX1, 72 for GSR, 79 for PRDX3, 75 for SOD1, and 68 for XDH (Supplementary Table S1 and S2.2). The alignments showed conserved regions in the sequences corresponding to functional domains of the proteins, with the PRDX1 gene being the most conserved one and GSR with a high variable region at the beginning of the sequences.

The topology of the phylogenetic gene trees recovers the main mammalian groups without evident accumulation of modifications in a specific branch (Supplementary Figs S1–S21). Most mammalian families had good support values for bootstrap (> 90) and posterior probabilities (> 0.9) in the nucleotide tree, especially the Cetacea and Pinnipedia groups. Because there were mostly minor disagreements between the gene tree and species tree, we used the species tree in the PAML selection analysis for a better delimitation of the foreground branches and branch length. For the HyPhy analysis, we used the gene tree as recommended by the literature (Kosakovsky et al. 2005, 2020b; Murrell et al. 2013, 2015; Smith et al. 2015; Wertheim et al. 2015).

Evolution of Antioxidant Enzymes is Accelerated in Aquatic Mammal Lineages

We performed branch analysis using different models to understand the selection patterns in the aquatic mammal branches. In all models, the background lineages presented similar values of ω for most genes (\(\underline{X}\) ω = 0.156, SD = 0.067).

In the CAT and XDH genes, the LRT test did not find significant changes in the substitution rates of foreground branches (Table 1), indicating no differences between rates of evolution of background and foreground lineages. In BM, we consider the genes evolutionary pattern as the most specific significative ω test, once the branch partitions are nested within each other in a hierarchical structure (i.e., 2ω is the general branch partition and within it, the 3ω, 5ω and 7ω are each more specific ones). Differences between the terrestrial and aquatic mammals—2ω—were significant in the GSR gene (Fig. 1a), suggesting that aquatic mammals (including cetaceans and pinnipeds) accumulated more mutations when compared to the terrestrial mammals included in our dataset. The 3ω fitted better for the SOD1 gene, with Cetacea presenting a higher ω value than Pinnipedia (Fig. 1b), suggesting that aquatic mammal groups evolved at divergent rates for this gene. The 5ω better fitted the gene PRDX3, which showed a higher accumulation of non-synonymous mutations in crown lineages of aquatic mammals compared to the ancestral ones (Fig. 1c). The 7ω, which distinguishes ω values of crown lineages of cetaceans and pinnipeds, was a better fit for the genes GPX3 and PRDX1. GPX3 acceleration is likely caused by the higher ω rates of the Odontoceti lineage (ω = 0.471), while Mysticeti had lower values (ω = 0.122). In PRDX1, all crown cetaceans accumulate mutations at higher rates than the other lineages (Odontoceti: ω = 0.509; Mysticeti: ω = 0.934) (Table 1 and Fig. 1d).

Table 1 Log-likelihood and omega values estimated for various lineages models in the aquatic mammals’ groups retrieved by the branch model analyses using codeml

Positively Selected Sites in Aquatic Mammals

For the codeml BSM, the genes PRDX3 (LRT = 21.47, p = 0.0000) and SOD1 (LRT = 9.19, p = 0.0024) in Cetacea, as well as GSR (LRT = 9.22, p = 0.0024), PRDX1 (LRT = 4.78, p = 0.0289), and XDH (LRT = 13.74, p = 0.0002) in Pinnipedia as foreground showed a better fit under model A than the null model (Table 2). In these genes, the positively selected sites (PSS) were identified using the empirical Bayes’ method (BEB) at posterior probabilities higher than 0.9, resulting in 14 PSS in PRDX3 and three PSS in SOD1 for Cetacea, as well as one PSS in PRDX1 and six PSS in XDH for Pinnipedia. No PSS with significant BEB posterior probabilities were found for GSR. For the SM, except for the PRDX1, all the other genes had a better fit for the alternative model M8 and PSS were identified, including some from BSM (Supplementary Table S4). The total number of PSS supported by at least three methods (PAML: BSM and SM and HyPhy: Contrast-FEL, FEL, SLAC, FUBAR) were GPX3 (4), GSR (1), PRDX3 (4), SOD1 (7), and XDH (10). For more details on the PSS results, see Supplementary Tables S5.1 and S5.2.

Table 2 Branch-site tests performed in the Paml (for each comparison of null and alternative model we have the respective LRT, P value, and sites identified) and HyPhy (aBSREL, BUSTED, and RELAX) packages using three approaches of foreground branches delimitation: each marine mammal clade independently (Cetacea and Pinnipedia) and the convergence test marking them both as foreground

We checked whether the PSS identified in BSM analyses were exclusive to the aquatic mammal lineages. The GPX3 gene shows the same mutations in cetacean groups and many other terrestrial species. In the GSR gene, contrast-FEL, FEL, FUBAR, and SLAC support a positive selection event on site 450 (Supplementary Table S5.1 and S5.2), with an amino acid change from arginine (Arg) to serine (Ser) in three pinniped species (Odobenus rosmarus, Neomonachus schauinslandi, and Phoca vitulina). However, such change is also shared with Eubalaena japonica, Bos taurus, and Bubalus bubalis. For the PRDX3 gene, only the PSS 209 had mutations that were exclusive to cetaceans. For SOD1, five out of seven PSS had exclusive changes in cetaceans (54, 103, 110, 114, and 115). Finally, for XDH, only 4 out of 10 sites had specific changes in pinnipeds (78, 434, 591, and 595). See the amino acid substitutions for these sites in Fig. 2.

For the Hyphy branch-site methods, no evidence of positive selection was identified in the pinnipeds group. On the other hand, for cetaceans, the RELAX test identified evidence of relaxation in the genes CAT, GPX3, and PRDX1 and selection intensification in the PRDX3 gene. For the GSR gene, the aBSREL test indicated a signal of positive selection in the clade Delphinidae (Tursiops truncatus, Orcinus orca, Lagenorhynchus obliquidens, and Globicephala melas), suggesting that this group may have experienced a different selective regime during cetaceans’ evolution. This test also showed a signal of positive selection in SOD1 for the Lipotes vexillifer lineage, but when the alignment was checked, sites with exclusive mutations for this species were not identified as PSS by any other methods.

The BSM using the close relatives groups of Arthiodactyla, Carnivores (except Pinnipedia) and convergence (Artiodactyla + Carnivora) had better fitting of the alternative model A for SOD1 in Artiodactyla with 25 and 37 as PSS and XDH in Carnivora with 575, 842, 892, and 1491 as PSS (Supplementary Table S7). The PSS identified here for the genes SOD1 and XDH were not the same as the ones found for any lineage of aquatic mammals, suggesting that selection is not acting in the same sites in close groups.

Convergent Evolution

To test for convergent selection between cetaceans and pinnipeds, we used the same codeml branch-site approach but selected both aquatic mammal groups as foreground branches. We found evidence for positive selection on the PRDX3 gene (LRT = 6.00; p = 0.0143), where the site 232 (PP = 0.986) changing from glycine (Gly) to serine (Ser) only in Ziphius cavirostris (Odontoceti, Cetacea) and Mirounga leonina (Phocidae, Pinnipedia) (Fig. 2).

Change in Amino Acid Properties and 3D Modeling

We investigated the impact of the aquatic mammal PSS mutations on protein structure. Using TreeSAAP, we identified radical changes in physicochemical properties (substitutions in the categories 6–8) for cetacean PSS found in genes GPX3 (167), SOD1 (2, 54, 103, 110, and 114) and for pinniped PSS in GSR (450) and XDH (78, 337, 591, 595, and 834) (Supplementary Table S8 and Table 3).

Table 3 PSS physicochemical amino acid properties variations determined by TreeSAAP and impact of mutants in human proteins by PolyPhen-2 results (“PolyPhen-2 Score” and “Score Classification”)

We then searched for known human variations matching the aquatic mammal PSS. We identified variations in the genes GPX3 (2), GSR (1), PRDX3 (1), SOD1 (2), and XDH (1) (Supplementary Table S8 and Table 3) that were not associated with diseases. For the rest of the sites, there were no known modifications or the variations were to a different amino acid in humans. However, in the SOD1 and XDH genes, certain sites had the same variations that are associated with diseases in humans. For instance, sites 54, 103, 110, 114, and 115 in SOD1 are close to sites affected by Amyotrophic lateral sclerosis 1 (ALS1) (Table 3). In the XDH gene, the substitution for a valine amino acid at site 834 was identified in some mammal species (Ornithorhynchus anatinus, Oryctolagus cuniculus, Dipodomys ordii, Peromyscus maniculatus, Cavia porcellus, Octodon degus, and Choloepus didactylus) and is associated in humans with Xanthinuria type II (ClinVar) or Hereditary xanthinuria type 1 (Supplementary Table S8). Most mutations were considered “Benign” by PolyPhen-2, indicating no negative effect on human proteins. However, we found sites that were “Possibly Damaging” and “Probably Damaging” in PRDX3, SOD1, and XDH (Table 3). Of note, site 232 in PRDX3 has been found in somatic mutations in cancers (CPTAC-3 project at NCI-TCGA).

In the protein structure modeling, we focused on the sites with changes exclusive to aquatic mammals in the genes PRDX3 (1), SOD1 (5), and XDH (4), in addition to the case of convergence in PRDX3. Most of the PSS did not result in differences in the protein structure compared to closely related species without the mutation (Fig. 3a). However, we observed that the PSS in SOD1 (sites 54, 103, 110, 114, and 115) are located spatially near the active and binding sites of the protein (Fig. 3c). The PSS in the XDH gene was located on the protein’s surface (Fig. 3d), which could affect its solubility. We found a slight translocation of the loop where site 232 is located in the species Ziphius cavirostris compared to Bos taurus (Fig. 3a). Notably, the PSS in PRDX3 and XDH were located in the domain regions of the protein, such as site 209 in the Thioredoxin Domain (IPR013766), site 232 in C-terminal domain (IPR019479), site 78 in 2Fe-2S ferredoxin-type iron–sulfur binding domain (IPR001041), and site 434 in FAD-binding domain, PCMH-type (IPR016166) (Fig. 3a, b, and d).

Fig. 3
figure 3

Protein structure modeled using the sequences of aquatic mammals in the AlphaFold Colab (AlphaFold v2.3.1.). Light blue: Domains of the protein; Red: Binding sites; Dark green: Active sites; Purple: Positively selected sites (PSS); Yellow: PSS in the closely related species; AS—Active site identification. a PRDX3 gene from the two species of Cetacea and Pinnipedia with convergent PSS, 232, in evidence, comparing with the site in Canis lupus familiaris for the Mirounga leonina and Bos taurus for Ziphius cavirostris; b PRDX3 from Tursiops truncatus with PSS 209 marked; c SOD1 for two species of cetaceans, a Mysticeti and an Odontoceti showing the position of different PSS for each group and the PSS 103, shared between them; d region of Felis catus’s XDH protein where the pinnipeds sequences were modeled with the PSS located in Odobenidae and Phocidae

Discussion

In this study, we examined the molecular evolutionary history of antioxidant enzymes, which play a critical role in eliminating reactive oxygen species (ROS) that can cause oxidative stress, especially during long dives. We focused on two groups of aquatic mammals, Cetacea and Pinnipedia, that share important physiological adaptations to apnea diving. We aimed to identify signs of positive selection in the genes encoding these enzymes and their potential impact on protein function. Our results revealed contrasting evolutionary rates among antioxidant genes and between the two groups of aquatic mammals, with accelerated rates of evolution compared to terrestrial mammals. We also identified positively selected sites with amino acid changes occurring exclusively in aquatic mammal lineages, including an example of convergent evolution.

The results from our branch models suggest an accumulation of non-synonymous mutations in lineages of aquatic mammals compared with their terrestrial counterparts. The ω values for our background species were similar to the average estimates for terrestrial species’ genomes (Yuan et al. 2021). The increase in the omega values is not uniform in the aquatic lineages, occurring at a higher rate in cetaceans, consistent with the previous study that showed higher mean ω values in cetacean genomes. The discrepancies observed between cetaceans and pinnipeds, for example, in the case of PRDX3 and SOD1, and inside crown groups of cetaceans after the splitting of dolphins and whales, such as in GPX3 and PRDX1, may be related to their life history, once each group present specificities in physiology and metabolic responses (Janecka et al. 2012).

The XDH gene codes for the enzyme xanthine dehydrogenase, which under hypoxic conditions, is converted into xanthine oxidase (XO), producing H2O2 in the reaction of hypoxanthine to uric acid, causing oxidative stress during submersion (Kelley et al. 2010). XO induction during voluntarily associated apneas in elephant seal pups suggests a role in the development of the antioxidant system (Vázquez-Medina et al. 2011a). Previously, unique sites were found in the XDH gene in pinnipeds, cetaceans, and other oxidant stress-tolerant species with no positive selection signals associated with them (Tian et al. 2022). In our work, we found evidence of positive selection, but the sites were not the same as in the previous work. In addition, this same study performed an in vitro assay and showed that even apparently neutral mutations in cetaceans can impact enzyme activity, raising questions about how adaptive mutations impact XDH.

The SOD1 gene belongs to the superoxide dismutase family, the first line of defense against superoxides (O2) and one of the most important antioxidants. It codes the CuZn-SOD isoform found in the mitochondria (Zelko et al. 2002), which is essential in removing superoxide produced during cellular respiration and by XDH under reoxygenation (Murphy 2009; Kelley et al. 2010). This enzyme is very active in several tissues of aquatic mammals compared with terrestrial ones (Elsner et al. 1998; Wilhelm Filho et al. 2002; Vázquez-Medina et al. 2006). Our results showed an accumulation of mutations in cetaceans compared to pinnipeds and terrestrial mammals. In the cetaceans’ lineage, we also identified positively selected sites with radical modifications near important parts of the protein, similar to previous studies (Tian et al. 2021a). Furthermore, the same study presented evidence of differential evolution of SOD in long/deep and short/shallow divers for cetaceans’ species. Finally, although pinnipeds also present differences in diving habits among species, they might not rely on mutations in the protein sequence, but in regulatory regions that affect gene expression levels (Righetti et al. 2014; Martens et al. 2022).

The CAT gene, responsible for catalase production, did not show any signal of differential evolution between aquatic and terrestrial mammals, except for relaxation in cetacean branches. The catalase activity in aquatic mammal tissues is not significantly higher than in terrestrial mammals (Cantú-Medellín et al. 2011; Wilhelm Filho et al. 2002; Vázquez-Medina et al. 2006). Although catalase is present in many tissues, its content in the liver and kidney is known to be higher, where it will respond to the severe increase of H2O2 (Chance et al. 1979; Michiels et al. 1994) and may be inactivated in constant concentrations of H2O2 (Kirkman and Gaetani 2007). Accordingly, with previous works that found little divergence between the CAT sequences in different metazoan species (Hewitt and Degnan 2023), our results show that this enzyme is very conserved even in macroevolutionary changes.

The glutathione system is crucial in eliminating ROS during oxidative stress (Michiels et al. 1994). The enzyme responsible for reducing oxidized glutathione (GSSG) to GSH is the glutathione reductase (GR), which is codified by the gene GSR and known to be more active in aquatic mammals than in semi-aquatic and terrestrial mammals (Vázquez-Medina et al. 2007; Righetti et al. 2014; García-Castañeda et al. 2017). In previous studies, the GSR gene has been identified as being positively selected in dolphin lineage (Yim et al. 2014) but not in pinnipeds (Martens et al. 2022). In our findings, we showed an acceleration of aquatic mammal lineages compared to terrestrial ones, but the methods were not capable of identifying specific positive selected sites. GSR has likely experienced molecular adaptation in specific lineages in aquatic mammals, as is the case of the Delphinidae family.

Glutathione peroxidases are also highly active in aquatic mammal tissues (Wilhelm Filho et al. 2002; Vázquez-Medina et al. 2006, 2011b; Righetti et al. 2014). This gene family includes glutathione peroxidase 3, codified by the GPX3 gene, abundant in the cytoplasm of cells. The PSS found in our work and by Tian et al. (2021b) are not exclusive to aquatic mammals. The results of RELAX, which identified a relaxation on cetacean lineages, may explain the higher omega value in Odontoceti. Glutathione peroxidase 3 is one of 6 other forms with different cellular locations and mechanisms of reaction and although being similar to glutathione peroxidase 1 (GPX1), GPX3 is secreted in plasma, while GPX1 acts inside the cell (Brigelius-Flohé and Maiorino 2013). Previous studies found positively selected sites and an increase in gene copies for GPX1 in cetaceans (Tian et al. 2021b).

The PRDX1 gene is highly conserved among terrestrial mammals, but we found evidence of relaxation on cetacean lineages, likely responsible for the high omega value in Mysticeti (ω = 0.934). No positive selection sites met the criteria of complementary tests. Together with our findings, the presence of a higher number of copies of the PRDX1 gene in cetaceans, specially Mysticeti, suggests that the presence of extra copies for this gene allows a decrease in purifying selection, enabling an accumulation in neutral mutations without impacting gene function (Yim et al. 2014; Zhou et al. 2018).

On the other hand, the PRDX3 gene shows evidence of positive selection, with an acceleration of the evolutionary rate among crown cetaceans and pinnipeds when compared to the ancestral lineages and the presence of a PSS. The results in BSM identified many PSS among Eschrichtius robustus, Eubalaena japonica, Megaptera novaeangliae, and Ziphius cavirostris, including site 209, the only PSS with exclusive changes for cetaceans. These species were also artificially grouped in the phylogenetic reconstruction for this gene, indicating that they share mutations beyond the PSS (Supplementary Figs S13–S15). Also, an intensification in selective pressure was identified in Cetacea. Interestingly, the PRDX3 gene also has a copy number expansion in cetaceans, although not as many as PRDX1 (Yim et al. 2014). Peroxiredoxins are considered a conserved protein among metazoans (Hewitt and Degnan 2023), and our data, combined with other studies, provides evidence for their adaptive evolution in cetacean oxidative stress tolerance.

To evaluate which sites were shared exclusively between the groups of aquatic mammals and potentially adaptive, we enriched our dataset with multiple mammalian species. Although the selective pressures we identified in the aquatic mammals differed from those observed in their terrestrial counterparts, we found evidence of convergent mutations between these groups. While sites with identical amino acids are common in mammalian species and likely derived from neutral processes, adaptive modifications in species with similar phenotypic traits are rare (Foote et al. 2015; Chikina et al. 2016). Nevertheless, it is still possible that a shared mutation may be neutral in terrestrial species but confer an advantage in the aquatic environment and therefore be adaptive.

Our results suggest that some of the antioxidant genes evolved more rapidly in Cetacea than in Pinnipedia (Fig. 1), indicating that these genes did not converge in selective pressure intensity. As phenotype convergence may occur at various levels, including amino acid substitutions (Hao et al. 2019) we also examined sites at the same position in independent lineages to identify convergent parallel substitution (Zhou et al. 2015). Using this method, we detected site 232 in the PRDX3 gene as a potential target of positive selection, presenting the same amino acid substitution in both Cuvier’s beaked whale (Ziphius cavirostris, Ziphiidae, Cetacea) and Southern elephant seal (Mirounga leonina, Phocidae, Pinnipedia). Besides presenting other known deep divers species in our dataset, such as sperm whale (Physester catodon, Physeteridae, Cetacea) and Pygmy sperm whale (Kogia breviceps, Kogiidae, Cetacea), the beaked whale and elephant seal have extraordinary aerobic diving limits without a significative increase in surface periods after longer dives (Hindell et al. 1992; Quick et al. 2020), spending less than 20% of their time in shallow waters (Hooker et al. 2015; Castellini and Mellish 2023).

Although the identification of site 232 was supported only by Contrast-FEL, which compares the rates of substitution between species and sites, we considered it relevant due to the evidence of intensified selection for the Cetacea clade and the high conservation of this site for other species in the alignment (Fig. 2). Furthermore, we can infer an impact on the protein structures, once it was classified as “Possibly Damaging” in human variations. The PRDX3 is part of the subgroup of peroxiredoxin containing 2-Cys as active sites, allowing them to build toroid structures, in this case, a dodecameric ring (Cao et al. 2005) This structure allows these enzymes to react with H2O2 in normal concentrations and change its conformation when levels of peroxide increase, initiating a regulatory signaling pathway (Wood et al. 2003). In a bovine PRDX3, site 232 is faced in the inner part of the ring, located in the junction area between dimmers, surrounded by many hydrophobic residues that participate in the biding (Cao et al. 2005) (Supplementary Fig S29 a, b). The presence of this dodecameric structure was shown to be more active in humans compared to its dimmers, with the internal part of the ring playing a key role in controlling the enzyme activity (Cao et al. 2007; Yewdall et al. 2018). We also identified in the dodecameric bovine model (Cao et al. 2005), a hydrogen bond between Glycine (ancestral state of site 232) and Lysine (site 218 in our alignment); however, we could not compare with our models because AlphaFold did not consider the Lysine as a part of an alpha helix structure, changing its relative position and affecting the inference of hydrogen bonding by ChimeraX. For these reasons, we consider the mutation in site 232 as a strong candidate for impacting the PRDX3 quaternary structure and its consequent activity in known deep/long divers species and future studies could focus on modeling the binding region in these non-model species.

Amino acid substitutions can modify proteins’ physical and chemical properties, ultimately impacting their structure and function (Betts and Russell 2003). In our study, while some sites had benign changes that did not significantly impact protein structure, others presented more radical changes that could affect protein function. For example, site 54 in SOD1 is located near the protein’s binding site, and mutations in this region can influence substrate selectivity and catalytic activity (Morley and Kazlauskas 2005). The presence of radical changes in conserved regions, such as protein domains, suggests a directional Darwinian selection, where residues are pushed toward non-synonymous changes (McClellan 2013). Despite providing valuable insights into the potential effects of these mutations on protein, further experimental approaches are required to confirm our findings and investigate regulatory regions and gene expression. Notably, aquatic mammals exhibit variations in amino acid properties that do not appear to have any negative effects, whereas similar changes in humans have been linked to various diseases (Ratovitski et al. 1999; Levartovsky et al. 2000; Rebelo et al. 2021). Therefore, aquatic mammals may have evolved mechanisms to tolerate and regulate these changes, providing valuable insights for future research into human health.

Conclusion

We found that the colonization of new aquatic environments by marine mammals and the necessity to cope with oxidative stress derived from hypoxia in dives has affected the evolution of genes related to the antioxidant system. We identified divergent acceleration in evolutionary rates between the groups of Cetacea and Pinnipedia, meaning that besides similar selective pressures, the genes evolved differently between the two lineages. In addition, genes like GPX3 and PRDX1 presented differences within the recent evolutionary history and diversification of cetaceans and pinnipeds lineages. Furthermore, we were able to infer the non-random distribution of positively selected sites, once many were close to important regions of proteins and with mutations usually associated with radical impacts in amino acid properties and negative effects in humans. Besides the majority of findings pointing to different strategies for managing ischemia and reperfusion in the aquatic mammals’ lineages, nevertheless, we were able to identify a convergent evolution between two extreme divers in a site located in a binding region of PRDX3 when forming dodecameric structures. Overall, we were able to establish an extensive comparison analysis between antioxidant enzyme evolution in our two lineages of interest, pinpointing interesting sites that could be investigated with additional experimental data, something that due to limitations of our study, we were unable to address.